January 1984 


AI&d Vol. 63 No. 1 
BELL LABORATORIES 


TECHNICAL 
JOURNAL 


A JOURNAL OF THE AT&T! COMPANIES 


Speech Recognition 

Packet Speech Transmission 
Trunk Implementation Planning 
TDMA Blocking System 
Queueing Approximation 


Inductive Noise in Chip Packages 


EDITORIAL COMMITTEE 


A. A. PENZIAS,'’ Committee Chairman 


M. M. BUCHNER, JR.! D. HIRSCH* R. L. MARTIN! 
R. P. CLAGETT? S. HORING! J. S. NOWAK! 
R. P. CREAN? R. A. KELLEY! B. B. OLIVER? 
B. R. DARNALL! R. W. LUCKY! J. W. TIMKO? 
B. P. DONOHUE, II? J. F. MARTIN2 


TAT&T Bell Laboratories = 7AT&T Technologies = 3AT&T Information Systems 
4AT&T Consumer Products = >AT&T Communications 


EDITORIAL STAFF 
B. G. KING, Editor L. S. GOLLER, Assistant Editor 
P. WHEELER, Managing Editor H. M. PURVIANCE, Art Editor 
B. G. GRUBER, Circulation 


AT&T BELL LABORATORIES TECHNICAL JOURNAL (ISSNO005-8580) is published by AT&T, 
550 Madison Avenue, New York, NY 10022; C. L. Brown, Chairman and Chief Executive 
Officer; W. M. Ellinghaus, President; V. A. Dwyer, Vice President and Treasurer; T. O. Davis, 
Secretary. 


The Journal is published ten times each year. The Computing Science and Systems section 
and the special issues are included as they become available. Subscriptions: United States— 
1 year $35; 2 years $63; 3 years $84; foreign—1 year $45; 2 years $73; 3 years $94. A 
subscription to the Computing Science and Systems section only is $10 ($12 foreign). Single 
copies of most issues of the Journal are available at $5 ($6 foreign). Payment for foreign 
subscriptions or single copies must be made in United States funds, or by check drawn ona 
United States bank and made payable to the Technical Journal and sent to AT&T Bell 
Laboratories, Circulation Dept., Room 1E335, 101 J. F. Kennedy Pky, Short Hills, NJ 07078. 


Single copies of material from this issue of the Journal may be reproduced for personal, 
noncommercial use. Permission to make multiple copies must be obtained from the Editor. 


Comments on the technical content of any article or brief are welcome. These and other 
editorial inquiries should be addressed to the Editor, AT&T Bell Laboratories Technical Journal, 
Room 1J319, 101 J. F. Kennedy Pky, Short Hills, NJ 07078. Comments and inquiries, whether 
or not published, shall not be regarded as confidential or otherwise restricted in use and will 
become the property of AT&T. Comments selected for publication may be edited for brevity, 
subject to author approval. 


Printed in U.S.A. Second-class postage paid at Short Hills, NJ 07078 and additional mailing 
offices. Postmaster: Send address changes to the AT&T Bell Laboratories Technical Journal, 
Room 1£335, 101 J. F. Kennedy Pky, Short Hills, NJ 07078. 


Copyright © 1984 AT&T. 


AT&T BELL LABORATORIES 


Technical Journal 


Vol. 63 JANUARY 1984 


Copyright © 1984 AT&T. Printed in U.S.A. 


A New Beginning 


A Probabilistic Model for the Performance of Word 
Recognizers 
A. E. Rosenberg 


A Simulation-Based Comparison of Voice Transmission on 
CSMA/CD Networks and on Token Buses 
J. D. DeTreville 


Trunk Implementation Plan for Hierarchical Networks 
A. N. Kashper and G. C. Varvaloucas 


Analysis of a Demand Assignment TDMA Blocking System 
S. M. Barta and M. L. Honig 


On Approximations for Queues, I: Extremal Distributions 
W. Whitt 


On Approximations for Queues, II: Shape Constraints 
J. G. Klincewicz and W. Whitt 


On Approximations for Queues, III: Mixtures of Exponential 
Distributions 
W. Whitt 


Computing Inductive Noise of Chip Packages 
A. J. Rainal 


PAPERS BY AT&T BELL LABORATORIES AUTHORS 
CONTENTS, FEBRUARY ISSUE 


No. 1 


33 


57 


89 


115 


139 


163 


177 


197 
205 


A New Beginning 


This issue of the Journal continues a publishing venture that began 
in 1922. Behind us we have sixty-two years of publishing the results 
of telecommunications research and development in the AT&T com- 
panies. This month—January 1984—we begin a new era, with new 
goals and a restructured company. And the Journal has a new name, 
the AT&T Bell Laboratories Technical Journal. 

In the 1922 inaugural issue, the editor stated the rationale for 
establishing the Journal: to bring together in one place significant 
papers on electrical communications. From the perspective of today’s 
level of technical sophistication, the term “electrical communication” 
may seem somewhat archaic, but first-class research and engineering 
were done in those early years, resulting in innovations that laid the 
groundwork for the communications explosion we are witnessing to- 
day. 

Creative research and development arise out of the needs of a culture 
and the talents of its human resources and, most importantly, out of 
the freedom of intellectual exchange that we value so highly. As 
participants in this very human process, the Journal staff and its 
advisers have been rewarded by a sense of satisfaction in being part 
of the process. 

Now we anticipate the great scientific and technological challenges 
that confront us in this new era. And we intend that the Journal— 
representing all of the AT&T companies—will continue to publish 
papers on the variety of technology being investigated to meet these 
challenges. 


Editor 
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A Probabilistic Model for the Performance of 
Word Recognizers 


By A. E. ROSENBERG* 
(Manuscript received February 7, 1983) 


This paper develops a probabilistic model to account for the error-rate 
behavior of isolated-word speech-recognition systems. It examines two kinds 
of errors, confusion error, an a priori characterization of a recognizer, which 
measures differences between words, and recognition (rank) error, an a pos- 
teriori characterization, which, in addition to taking into account differences 
between words, accounts for differences between different tokens of the same 
word. It is shown that these kinds of errors can be modeled by describing 
recognition trials as Bernoulli trials. Good models of error-rate behavior as a 
function of vocabulary size can be obtained if the distributions of confusion 
and recognition (rank) number are considered to be mixtures of binomial 
distributions. The data obtained from a recent experiment in isolated-word 
recognition with a large vocabulary (1109 words) are used to evaluate the 
model. Experimental error-rate functions obtained from each of six talkers 
and three partitions of the vocabulary are fit by means of an optimization 
algorithm to model functions based on mixture distributions. The results 
indicate that two-way mixture distributions account quite well for the exper- 
imental performance results. 


I. INTRODUCTION 


A critical concern in the study and development of automatic 
speech-recognition systems is specification of their performance. Per- 
formance is typically specified by recognition error rate, which is the 
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fraction of trials in a test of the system in which incorrect decisions 
are obtained. This specification should be accompanied by a descrip- 
tion of the test vocabulary, the talker population, the talking environ- 
ment, and other pertinent conditions relating to both the training and 
testing of the system. The interaction among these factors and their 
effect on performance are not well understood. Indeed, altering a 
variable associated with any of these factors can change the perform- 
ance of a system in often unpredictable and drastic ways. 

A more general specifier of recognizer performance is the rate at 
which the best n choices offered by the recognizer contain the correct 
word. More recently, specifiers measuring “complexity”! and effi- 
ciency” have been introduced. The relationship among specifiers is 
another aspect of recognizer performance that is not well understood. 

It is the purpose of this paper to examine and establish probabilistic 
models to describe the performance of isolated-word speech-recogni- 
tion systems and to relate various performance specifiers. The distinc- 
tion will be made between performance specifiers that characterize 
systems through the training phases of the system and those that 
characterize the overall behavior in test use of the system. We will 
focus on modeling performance behavior as a function of vocabulary 
size for a given recognizer, over a small population of talkers, and 
three types of vocabularies. Some speculation will be offered on the 
relation between model parameters and the recognizer, talker, and 
vocabulary. 

The paper is organized as follows. In Section II, performance meas- 
ures are defined and the probabilistic models, which form the basis 
for describing the behavior of these measures as a function of vocab- 
ulary size, are introduced. In Section III we make use of data obtained 
in an experiment with a speaker-dependent isolated-word recognizer 
using a large vocabulary to illustrate the behavior of some of the 
performance measures and evaluate how well the probabilistic models 
account for the behavior. Section IV presents a discussion that offers 
some speculation regarding the significance of the parameters that 
specify the models. Section V presents some conclusions. 


Il. DEFINITIONS AND PROBABILISTIC MODELS 
2.1 Bernoulli trials as the basis for confusion and rank 


Suppose we have a vocabulary of N words, V = {v1, ve, --- , uv}. Let 
dj be a distance measure between a token of word vu; and a token of 
word vu;. The source for this distance might be some perceptual exper- 
iment, a phonetic or linguistic measurement, or the output of an 
automatic recognizer. In what follows the distance is considered to be 
the output of a recognizer. As the output of a recognizer it will normally 
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be assumed that the first index, 1, refers to an input test word while 
the second index, j, refers to a (single) prototype word. 

The experiment underlying the formulations that follow is the 
comparison of a token of an input word v; with the prototype for each 
of the remaining N — 1 words in the vocabulary. 

Suppose we are concerned with a particular word, v;, in the vocab- 
ulary. Consider two events 


di < T, ] ~ I (1) 
and 
dy < dn, J# 7, (2) 


where J’ is some preassigned distance threshold and dy; is a “self-dis- 
tance”. Note that self-distance, dz, generally represents the distance 
between two different tokens of a word, v;, and therefore, must be 
greater than zero. 

Now consider the number of occurrences of these events in the 
underlying experiment, defined as follows: 


qT) = \{j # I: dy < T}| (3) 
and 
rr= |{j # I: dj < dn} I, (4) 


where |{ }| is the cardinality or count of the events in the brackets. 
Note that q;(T) is the basis for the notion of confusability or com- 
plexity introduced in Rabiner et al.’ whereas r; + 1 is the rank of the 
correct word input to a recognizer. When r; = 0, the best matching 
reference prototype corresponds to the correct word vy. 

If, in eq. (8), both J and j represent reference word tokens, q;(T) 
can be considered to characterize a recognizer through the training 
phase of the system, in other words, an a priori characterization. In 
eq. (4), however, the self-distance, dz, specifically represents the 
distance between a test word input and the reference prototype for 
that word. Thus r; characterizes a system in its test or use phase, and 
is therefore an a posteriori characterization. 

Consider now a probabilistic formulation that can be applied to 
either of the events defined in (1) and (2). Define a random variable 
X,; such that 


_ Jl if the event occurs 
Ay i otherwise (5) 
with 


Prob{X7 = 1} = pi (6) 
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and 

Prob{X;; = 0} =l1- DI. (7) 
Each event [(1) or (2)] is thus considered to be a Bernoulli trial. We 
assume that the Bernoulli probability is independent of the reference 
word j. When referring to event (1), p; depends on T, but this is 
omitted to keep the notation simple. The counts defined in eqs. (3) 
and (4) are thus sums over j of the random variable X;;. We denote 
this sum generically as s;, it being understood that s; = q;(T) when 
referring to events of type (1), and s; = r; when referring to events of 
type (2). We can see that 

$1 = 2 Xj; (8) 
jl 

from which it follows that 0 < s; <= N — 1. Given eqs. (6), (7), and (8) 
we obtain 


8 {s1} = (N — lpr (9) 
and 
Var{s;} = (N — 1)p;(1 — py). (10) 


Also, the probability that s; assumes a particular value k, 0 < k s 
N — 1, obeys the binomial probability law 


Prob{s; = k} = @ b ' pi(1 — py)***, (11) 


where @ , ' is a binomial coefficient. 


k 


Two kinds of error measures are introduced. An error measure is a 
monotonically increasing function of s; that equals 0 when s; equals 0, 
and approaches or equals 1 when s; equals N — 1. 

The first error measure, e;, is defined as 


i 
1+ s, 





ep=1—- : (12) 


from which it follows that 0 Ss e; = 1 — 1/N. When s; is understood to 
be ‘q;(T), e; is similar to confusability or complexity error as defined 
in Rabiner et al.1 When s; = 17, e; is related to the notion of “efficiency” 
introduced by Smith and Erman’ to characterize recognizer perform- 
ance. Using eqs. (11) and (12) it can be shown that 
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N-1 1 
SE f{e;} » P{s; = k} ( = 4) 


= L=() = pr) 
DiN 


The second error measure is associated with the occurrence of any 
nonzero value of s;, that is, any confusion, q;(7’), at all, or any rank, 
r; + 1, other than 1. Define 


a= if $s; >0 (14) 


=1 (13) 


0 if s; = 0. 


This is the conventional or standard recognition error generally used 
to characterize the performance of automatic recognizers. From eq. 
(11) we have 


Prob{s; > 0} = 1 — Prob{s; = 0} = 1 — (1 — p,)X71. (15) 
Then it follows that 
AE} =1—- (1 - py). (16) 


Often recognition-error rates are calculated to provide the frequency 
with which the correct word is included among the top c choices 


provided by the recognizer s; = 0, 1, 2, --- ,c — 1. This represents a 
generalization of the preceding definition for E; which is expressed by 
= 1 if s;z=c 
ple) = 10 if sr<e. v0 
Since 
N-1 
Prob{s; = c} = ¥ p#(1 — p,)N t* (18) 
k=c 
we have 
N-1 
PE\E(c)} = 2 pba (19) 


Although the models which are derived in this paper could easily 
include this generalization, we restrict our attention to the case for 
which c = 1, referred to as standard error. 


2.2 Mixture models 


The foregoing formulas pertain to a single word v; in a vocabulary 
V. Our object is to model behavior of confusability or rank over an 
entire vocabulary V. It is therefore necessary to make some assump- 
tions about the behavior of p; over all the words in V. The simplest 
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possible assumption is that p; is constant over V, i.e., py = p for J = 
1, 2, ---, N. It will be shown in the following sections that this 
assumption leads to very poor models of the actual experimental 
behavior. 

A more general assumption is the following. Assume that the Ber- 
noulli probability defined in eq. (6) is itself a random variable, py, 
which may assume different values from word to word in a vocabulary, 
or indeed, from trial to trial of the same word. Suppose there are M 


values py can assume, Pm, m = 1, 2, --- , M,* such that 
Prob{ py = Dn} = hm, m=1,2,---,M (20) 
with 
M 
yy he, (21) 
m=1 


where h,, is the probability that py assumes the value p,,. (It is possible 
to generalize still further by assuming py to be continuously distrib- 
uted.) It is now possible to generalize s; to sy over the entire vocabulary 
V. From eqs. (11) and (16) we obtain 


M 
Probisy 2h 3 Fiz @ - ) pi(1 — pm), (22) 
m=1 


This expression represents a so-called compound binomial distribution 
or mixture of binomial distributions.** With this interpretation, each 
time we perform the underlying experiment represented by (1) or (2) 
the probability assumed one of the M values p,, over all the N — 1 
comparisons with the words in V. (This is in contrast to the situation 
in which the probability may assume different values for each com- 
parison in the underlying experiment.) Using eq. (22), general expres- 
sions can be obtained for the mean and variance of sy and for the two 
generalized error formulations ey and Ey. All of these have the same 
form as eq. (22), that is, &{-} = 1%) h, ¥{-|m}. Thus, 


Etsy} = (N — 1)pv (23) 


and 


M 
Vari{sy} = (N F 1) x AmPm(1 ~ Pm) 


M 
N= 1) x hm(Pm — Pv), (24) 


* Note that the index m on p no longer refers in general to individual words in the 
vocabulary as in eq. (6). 
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where 


M 
Py = xX haDmne (25) 
Also, 
= aa 1 (1 = nl 
AEN = 2 ie ( DN 
> nad 1 — (1 — pn)” 
oc 2 a ( PmN ) om 
and 
M 
E\Ey} =1- X hl =p, (27) 


With M set to 1, these expressions revert to the form of the earlier 
expressions for a single word J. 


It]. EXPERIMENTAL EVALUATION 


The experimental data used in this study were obtained using the 
AT&T Bell Laboratories Linear Predictive Coefficient (LPC) based 
isolated-word recognition system.°® The vocabulary was the 1109- 
word so-called Basic English vocabulary of Ogden.’ The recognizer 
was used in a speaker-dependent mode. Six native American talkers, 
three male, three female, participated in the experiment. Each talker 
trained the system using the robust training procedure of Rabiner and 
Wilpon,® giving a single reference prototype for each word in the 
vocabulary. In addition, four sets of test utterances were obtained 
from each talker over a four-week period. Both the training and test 
utterances were collected over dialed-up telephone lines using an 
ordinary telephone handset with the talker seated in a sound booth. 
For each talker, each test word was input to the recognizer and 
compared with every reference prototype word for that talker. For 
each such comparison the recognizer provides a distance figure meas- 
uring how closely the test word matches a prototype word. In a typical 
recognition trial the word recognized is associated with the best 
matching prototype word, that is, the one with the smallest distance. 
The raw experimental data consist of four sets of 1109 x 1109 distance 
matrices for each of the six talkers. 

The large size of the vocabulary provides an opportunity to inves- 
tigate recognition performance over a variety of experimental condi- 
tions related to vocabulary size and content by choosing appropriate 
subsets of the whole vocabulary. A series of such experiments using 
this experimental database has been described in a previous report.* 


SPEECH RECOGNITION 7 


In the present experiment we focus on three partitions of the 1109- 
word vocabulary, the 605 monosyllabic words contained in the vocab- 
ulary, the remaining 504 polysyllabic words, and the entire vocabularly 
itself. For each of these, randomly selected subsets of various sizes are 
chosen. The subset sizes chosen are 


N = 10, 20, 50, 100, 200, 400, (800), PARTSIZ, 


where PARTSIZ = 605, 504, or 1109 for the monosyllabic, polysyllabic, 
and whole vocabulary partitions, respectively. For each subset size N, 
a total of MT = min[50, PARTSIZ/N] subsets of words selected at 
random without replacement from each partition are specified. The 
same subsets are specified over all test sets and talkers. Thus, in the 
aggregate, for each subset size N, results are obtained over N*MT 
different words, where 500 =< N*MT <= PARTSIZ. 

The experimental performance data that are presented in this paper 
are generally given as functions of subset size for each talker and 
vocabulary type, and represent an average over all the talker’s four 
test sets and vocabulary subsets for each subset size. 


3.1 Experimental performance measures 


This section presents experimental examples of the performance 
measures introduced in Section II. To recapitulate, confusion number 
and rank number are defined as follows: 

1. q;(T): confusion number for a given word vu; e V, is the number 
of words (other than v;) in a given vocabulary subset V whose distance 
to the given word is less than some threshold T [from eq. (3)]. 

2. ry: rank number for a given word v; « V, is the number of words 
(other than v;) in a given vocabulary subset V whose distance is less 
than the self-distance for v; [from eq. (4)]. 

Experimental averages are obtained as follows. Suppose word v; is 
included in V,,(N), where V,,(N) is the mth vocabulary subset of size 
N, m = 1, 2, --- , MT, and MT is the total number of subsets of size 
N. The words in each subset V,,(NV) are selected at random without 
replacement from a vocabulary V of total size Q = N. Then, given the 
confusion number and rank number, grv,(ms.(T) and rrv_.yst, Te- 
spectively, for word v; from subset V,,(N), test set t, and talker s, the 
experimental averages are 


4 1 MT 1 
Gsv(N, T) = 7m MT x N av’ De | vance T) (28) 
and 
: Ee a ee 
F.v(N) = ri 2 ur xX N Tee T1Vn(N),s,¢ 9 (29) 


8 TECHNICAL JOURNAL, JANUARY 1984 


respectively. Similarly, for the two error measures that were introduced 
in Section II, the experimental averages are 


5 ee ee | 1 


iN Tal SS SS — Tat a “OO 
eq, v( ) 4a MP noi N IeV,,(N) 1+ 1,V,,(N),3,t(T') ( 
and 
1 4 1 MT 1 1 
é,.v(N) =1-= 5 


4 aa MT na N rv, 1 + Pryv,,.),s¢ 
for the efficiency errors of confusion and rank, respectively, and 


Eus,v(N; T) 


= 5 a . Zz [{2: Te Va(N), drvow)se(T) = O}} (32) 
421 MT na N wena 
and 
= jm) tc | 
E,s,v(N) = Z 2 UT y N |: De Via(N), rrv,cys¢ 2-O}| (33) 


for the standard errors of confusion and rank, respectively. 

Note that both experimental confusion and rank or recognition data 
are obtained by averaging over four test utterances. In the previous 
section we noted that confusion could be considered an a priori, 
training characterization of system performance if it is based on 
distances between prototypes. Since this is not the case here, the 
description of confusion does not strictly hold. However, we do not 
expect distances between test utterances of different words to be 
significantly different from distances between prototypes of the same 
words. This point is discussed again in Section IV. 

Shown plotted in Fig. la as a function of N are experimental 
averages for confusion number and rank number for talker 3 and the 
whole vocabulary Vy, qs,v,(N, T) and 73,y,,(N). Average confusion 
number is plotted for five threshold values, T = 0.20, 0.25, 0.30, 0.35, 
and 0.40. For each of these plots, straight-line fits are obtained by 
least-squares regression. It can be seen that straight-line fits are quite 
good, each one having a correlation coefficient of better than 0.9995 
with the data, with the exception of gs,v,,(N, 0.20) and 73,v,,(N), whose 
coefficients of fit are both 0.998. The linear trend is predicted by the 
model as expressed in eq. (23). The interpretation of the linearity is 
quite natural. Simply, as the size of the vocabulary grows, the number 
of confusable words, or the number of words better than the input 
word (the rank number of the input word), grows proportionately. 

For the same talker and vocabulary, and for the same set of thresh- 
olds, experimental averages for efficiency error, é,3,v,,(N, T) and 
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AVERAGE CONFUSION 
NUMBER 
T 
A 0.20 + 0.35 
O 025 oO 0.40 
@ 0.30 
AVERAGE RANK 
NUMBER 


AVERAGE CONFUSION OR RANK NUMBER 





AVERAGE EFFICIENCY 
CONFUSION ERROR 
T 

A 0.20 + 0.35 

O 0.25 oO 0.40 

@ 0.30 
AVERAGE EFFICIENCY 
RECOGNITION 


AVERAGE STANDARD 
CONFUSION ERROR 
T 
A 0.20 + 035 
O 025 O 0.40 
@ 0.30 
AVERAGE STANDARD 
RECOGNITION 


(RANK) ERROR (RANK) ERROR 


AVERAGE ERROR PERCENTAGE 





: : ae tee 
10 20 50 100 200 500 1000 2000 10 20 50 100 200 500 1000 2000 
SIZE SIZE 


Fig. 1—(a) Average confusion and rank number, (b) average efficiency confusion and 
rank error, and (c) average standard confusion and rank error as functions of vocabulary 
size, for talker 3, vocabulary type Vw, and five values of threshold, T (for confusion). 


é,3,v,(N), are plotted in Fig. 1b and standard error Fy3,y,,(N, T) and 
E.3,vy(N) are plotted in Fig. lc, both as a function of N scaled 
logarithmically. Both the efficiency- and standard-error curves assume 
the same trends as a function of vocabulary size, increasing monoton- 
ically and approaching one asymptotically. The efficiency-error curves 
assume uniformly smaller values than their standard error counter- 
parts for each value of N. For small N each efficiency-error curve has 
approximately half the value of its standard-error counterpart. For 
confusion error, error increases monotonically as the distance thresh- 
old, T, is increased or relaxed. 

The standard recognition- or rank-error curve plotted in Figure 1c 
is representative of results presented in the earlier report.’ In the 
present study, additional data values are presented that extend the 
curve to vocabulary sizes less than 100. Standard error rate for this 
talker is approximately 4 percent for 10-word vocabularies, 9 percent 
for size 100, and 20 percent for the full vocabulary of 1109 words. (The 
same curve is shown with an expanded error scale in Fig. 2.) In the 
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earlier report it was suggested that for vocabulary sizes greater than 
100, doubling the size increases error by a constant amount, a linearly 
increasing trend with N scaled logarithmically. It can be seen here 
that with the extension to smaller vocabulary sizes the linear trend is 
restricted and approximate. 


3.2 The relation between recognition and confusion error 


The difference in form between the rank- or recognition-error curves 
and the confusion-error curves, for any threshold value, is quite 
marked. The relation between recognition error and confusion error 
as a function of vocabulary size is rather complex. 

The following are two hypotheses for relating recognition and con- 
fusion error. First, we might examine average confusion number as a 
function of distance threshold T for any given vocabulary size and 
find that value of T for which average confusion number is equal to 
average rank number. For the results shown in Fig. 1a for talker 3 and 
vocabulary Vy, this value of T lies between 0.35 and 0.40. We could 
reason that confusion-error rates ought to be the same as recognition- 
error rates for a value of 7’, which on the average includes as many 
confusable words as the rank of the correct word. However, examining 
the error rates in Figs. 1b and 1c we find that this hypothesis holds 
only for the very smallest vocabulary size, N = 10. The threshold 
suggested by this hypothesis leads to confusion-error rates much 
greater than the recognition-error rates actually observed for larger 
values of N. 

The second hypothesis suggests that the appropriate threshold for 
which confusion-error rates ought to be the same as recognition-error 
rates can be found by associating the threshold with the average self- 
distance. We have carried out the calculation of average self-dis- 
tance for this talker and vocabulary in the same way as the other 
calculated experimental averages, 
=—— es tae | 
DSLF3,yy(N) A 2 MT 2 N ae dy. (34) 
We obtain DSLF3,y,,(N) ~ 0.235 for all values of N. Examining the 
confusion-error rates in Figs. 1b and Ic, only for the very largest value 
of N, where recognition error is bracketed by confusion error rates for 
T = 0.20 and T = 0.25, do we find agreement with this hypothesis. 

It is not surprising that average self-distance is independent of N, 
since the similarity between two tokens of the same word is not 
dependent on the size of the vocabulary from which the words are 
taken. Nor is it surprising, as we have seen earlier, that the rate of 
increase in average rank number is independent of N. But neither of 
the associated thresholds is adequate to relate confusion and recogni- 
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tion for all values of N. The explanation lies in the following obser- 
vations. Referring back to the basic definitions expressed in (1) and 
(2), if the individual self-distance, dz, were absolutely constant from 
trial to trial and from word to word, there would be a threshold equal 
to this constant for which confusion-error rates would be equal to 
recognition-error rates. However, even though average self-distance is 
constant for all N, individual self-distances fluctuate widely from trial 
to trial resulting in rank number distributions which are also quite 
wide. Thus for small subset sizes, these fluctuations produce an average 
rank number that is significantly greater than the average confusion 
number associated with a threshold equal to the average self-distance. 
However, when vocabulary size grows, so does word density, that is, 
the average distance between different words decreases. As this occurs 
the self-distance fluctuations become less important compared with 
errors attributed to the increasing density. Thus for large vocabularies 
a threshold equal to average self-distance relates confusion error to 
recognition error. 

The interpretation of the relation between confusion and recogni- 
tion in the light of model parameters will be brought up in the 
discussion, Section IV. 


3.3 Estimation of model parameters and fits to experimental results 


It has already been shown that average confusion number and 
average rank number grow linearly with subset size in agreement with 
the model as expressed in eq. (23). The slope of this linear function is 
an estimate of py given in eq. (25), the average model Bernoulli 
probability of an error or confusion. Estimates of py are obtained as 
the slope estimates for the regression-line fits shown in Fig. 1a. Table 
I shows these estimates along with the coefficients of fit (correlation 
coefficients). Since these are linear relations, they present no infor- 
mation concerning individual mixture probabilities, nor, indeed, 
whether there are any mixtures at all. 

The effect of mixtures becomes apparent when we attempt to model 


Table |—Linear growth of average 
confusion and average rank numbers 


T Values Pv Estimates r Coefficients 
Average Confusion Number 
0.20 1.18 x 10% 0.99784 
0.25 5.43 x 107* 0.99980 
0.30 1.77 x 10° 0.99995 
0.35 4.63 x 1073 0.99993 
0.40 1.03 x 107? 0.99994 
Average Rank Number 
— 6.67 x 10°° 0.99801 
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error-rate behavior as a function of vocabulary size. To illustrate the 
effect, the standard-rank error-rate curve shown in Fig. 1b is displayed 
once more on an expanded error-rate scale in Fig. 2a. (We refer to 
rank and recognition error rate interchangeably.) Along with this 
curve, we have plotted the function for expected standard error rate 
given in eq. (27) with M set to 1 for four different values of p. These 
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Fig. 2—Average standard recognition (rank) error as a function of vocabulary size 
for talker 3, and vocabulary type Vw with (a) four different simple (one-way) models of 
standard error [eq. (27) with M = 1], and (b) one-, two-, and three-way mixture-model 
fits of standard error. 
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four values of p bracket the range of the values of py given in Table I. 
It is quite evident that the simple Bernoulli trial model, obtained with 
M set to 1 in eq. (27), cannot provide a good fit to the experimental 
results. 

The recognition error-rate curve is plotted once again in Fig. 2b 
along with the function of eq. (27) for three values of M, M = 1, 2, 
and 3. Table II shows the h and p parameters selected for these 
functions along with coefficients of fit. The parameters are obtained 
by using an optimization routine’ to provide a best fit to the experi- 
mental data. We employ the convention that p;) = po =, ---, = Dm 
and hy = 1 — Y“2 h,, to satisfy eq. (21). Note that the fits obtained 
improve progressively with increasing M and the values of py obtained 
for M = 2 and M = 3 bracket the observed value for the slope estimate 
for 7s,v,(N) given in Table I. 

Although it is clear that the three-way mixture model, M = 3, 
provides a superior fit, hereafter we will provide only two-way mixture 
model fits, M = 2. It seems reasonable to expect that models with 
larger M’s should provide better fits because it is reasonable to expect 
such models to accommodate and discriminate better among all the 
effects that contribute to the error-rate functions. However, it is also 
true that there are only eight data points, which is a small number of 
points to support the number of parameters associated with such 
models. In addition, there is a certain appeal of parsimony in using 
two-way models, since it may lead to simpler or more direct interpre- 
tations of the parameters. Some suggestions for interpretations are 
discussed in Section IV. Also, although the two-way model fit is 
somewhat deficient for the case shown here, in most of the other 
experimental results that are presented the fits are quite adequate. 

For the two-way model, we refer to p; and pe, Pp; = P2, as the type 1 
and type 2 population probabilities, respectively, and h (dropping the 
subscript), as the mixing coefficient for the two populations. 

The optimization-fitting procedure is described briefly. The function 
that is minimized is the sum over the subset sizes of the squared 
differences between the observed values and the calculated model 
function value. This function, the gradient of the function, initial 
values for the parameters, and some convergence constants are sup- 
plied to the optimization routine. Usually the routine is run several 


Table II—Model parameter estimates for average standard 
recognition error 


M shi, he, ... hua Pi, D2, --- DPM Dv r 

1 — 2.61 x 107* 2.61 x 10+ 0.9607 
2 0.085 4.32 X 1077, 1.37 x 1074 3.82 x 10°? 0.9856 
3 0.046, 0.093 1.71 x 1071, 4.72 x 107%, 7.26 x 10° 8.35 x 107° 0.9990 
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times for each set of experimental points with different sets of initial 
values to ensure that the optimized parameters represent a global 
rather than a local minimization. In some cases, particularly for overall 
low error rates, the minimization is relatively insensitive to variation 
of the h parameter. 


3.4 Two-way mixture model fits to experimental data 


This section presents a variety of experimental confusion- and 
recognition-error results as a function of vocabulary size, together with 
two-way mixture model fits. The object is to demonstrate that the 
model represents the error-rate behavior as a function of vocabulary 
size quite well, and to point out the effects of talker, vocabulary type, 
etc., on the parameter estimates obtained from the model. 


3.4.1 Model fits to recognition error as a function of talker and 
vocabulary type 


Recognition error rate results as a function of vocabulary size, both 
efficiency and standard error, are displayed in Figs. 3 and 4, together 
with model fits for each example. Figure 3 shows results for the three 
vocabulary types, the whole vocabulary, Vw, monosyllables, Vy, and 
polysyllables, Vp, for three talkers. Figures 3a through 3c show effi- 
ciency-error results for the three talkers while Figs. 3d through 3f 
show standard-error results. Figure 4 presents results for all six talkers 
for a single vocabulary type, Vw. Fig. 4a presents efficiency-error 
results and Fig. 4b presents standard-error results. The three talkers 
selected for Fig. 3 are associated with median performances in Fig. 4. 
The performance trends of these three talkers for the three vocabulary 
types in Fig. 3 are representative of all six talkers. 

Recall once again the distinction between standard error and effi- 
ciency error. Standard error is based on a count of trials with nonzero 
rank, while efficiency error accounts for the distribution of all rank 
numbers over all the trials, and is therefore, in some sense, a finer 
characterization of error performance. The differences between the 
two are generally predictable, as pointed out in the previous section. 
Both kinds of error results are shown, principally, to compare the 
parameter estimates obtained for each model. 

The trend in error-rate performance as a function of vocabulary 
type for individual talkers presented in Fig. 3 is a familiar one. That 
is, performance degrades for any vocabulary size from the more redun- 
dant to the less redundant vocabulary types, from Vp to Vw to Vy. 

The performance of individual talkers for a single vocabulary type 
presented in Fig. 4 shows considerable variability. The performance 
of one talker, talker 4, is distinctly poorer than the others. The best 
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Fig. 3—(a), (b), (c) Average efficiency recognition (rank) error, and (d), (e), (f) average standard recognition (rank) error as a function of 
vocabulary size, with two-way mixture model fits, for three vocabulary types. 
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Fig. 4—(a) Average efficiency recognition (rank) error, and (b) average standard 
recognition (rank) error as a function of vocabulary size, with two-way mixture model 
fits, for six talkers and vocabulary type Vy. 


performances are obtained for talkers 1 and 2, while the remaining 

three are grouped together in an intermediate range of performance. 
Model fits have been carried out, as described previously, for both 

the efficiency and standard-error results for each of the six talkers 
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Table III—Model parameter estimates for average efficiency recognition (rank) error 
Vp Vw Vm 
Talker h Pr P2 h Pr P2 h Pr P2 

1 0.037 7.85 x 103 4,92 x 1075 0.055 9.49x10% 9,80x 107° 0.105 829x10* 1.96x 10 

2 0.020  9.99x10- 2.53 x 10° 0.020 5.24x10% 417x107 0.040 9.04x10° 9,31 x 10 

3 0.040 1.99 x 107 5.24 x 10° 0.076 5.47xX10?  168x10~ 0.126 622x10% 4.36107 

4 0.119 1.09 x 107 5.07 x 10° 0.208  7.36xX10% 421x107 0.277 816x10?  9,40x 10~ 

5 0.046 2.23 x 10°? 1.74 x 10 0.076 304x102 199x10~ 0.109 371x102 4.79x10~ 

6 0.038 1.69 x 107 5.07 x 10° 0.072 328x107  1.52x10~ 0.109 466x107 4.16% 10~ 
Mean 0.050 295x107 1.43 x 10 0.085 344x107  1.75x10~ 0.128 408x110? 4.27x10~ 
Standard deviation 0.035 3.98 x 107 1.86 x 10~ 0.064  262x10% 1.30% 10~ 0.079 291x107? 294% 10~ 

Without Talker 4 
Mean 0.036 1.36 x 107? 7.03 x 1075 0.060 265x107? 132x107 0.098  326x10% 324x107 
Standard deviation 0.010 8.92 x 10° 5.90 x 1078 0.024 199x107 622x107 0.033 2.3710 1.69 x 1074 
Mean Standard Deviation 
Talker h Pr D2 h Pi P2 

1 0.066 854x107 1.14 x 10 0.035 849x110 7.48x 10% 

2 0.027 5.09 x 107° 5.34 x 10-5 0.012 402x110 354x10° 

3 0.081 4.56 X 107? 2.19 x 10-4 0.043 2.26x10% 197x10~ 

4 0.201 8.81 x 1072 6.23 x 10-4 0.079 186x102 278x107 

5 0.077 2.99 x 107? 2.84 x 10-4 0.032 7.41x10% 169x10~ 

6 0.073 3.21 x 107? 2.06 x 10-4 0.036 149x102 189x10~ 


and each of the three vocabulary types. This includes all the results 
shown in Figs. 3 and 4 plus the Vy and V> results for talkers 1, 2, and 
4, These are two-way mixture fits obtained by setting M to 2 in eq. 
(26) for efficiency error and eq. (27) for standard error. Model param- 
eter estimates for efficiency error are presented in Table III. The 
parameter estimates for standard error are not shown, but comparing 
them to the efficiency-error estimates indicates general, if not neces- 
sarily close, agreement. This agreement reinforces our assumption 
that the models developed to account for both kinds of error functions 
are substantially correct since the same experimental data underlay 
both performance measures. 

Table IV compares model fits for average rank, efficiency recogni- 
tion error, and standard recognition error for each of the six talkers 
for Vw. The table presents estimates for py and coefficients of fit, r. 
As in Table I, the estimates of py for average rank are obtained from 
slope estimates for least-squares regression lines. For efficiency and 
standard error, the py estimates are reconstructed using eq. (25). The 
pv estimates obtained from efficiency and standard-error parameters 
are in fairly good agreement with each other, but are generally less 
than half the values of the estimates obtained for average rank. This 
discrepancy was pointed out in the previous section, where it was 
implied that it is related to the extent that two-way mixtures model 
the data compared with models with higher specified values of M. An 
examination of the model function fits in Figs. 3 and 4 and the 
coefficients of fit in Table IV indicates generally close agreement with 
the experimental results. The possible exceptions are associated with 
high error-rate performances (for example, for talkers 3 and 4). For 
these cases the fits are poorer for the standard-error functions than 
for the corresponding efficiency-error functions. 

As an aid to improve interpretations for the model parameters, it 
would be useful in an examination of the parameter estimates to detect 
significant trends associated with the variation of experimental con- 


Table |V—Comparison of model fits for average rank, efficiency 
recognition error, and standard recognition error 


Efficiency Recognition Standard Recognition 
ee Error Error 
Talker Pv r By r By r 
x10 = 0.99705 6.11 x 107* 0.99905 5.541074 0.99942 
x 10% 0.97985 1.46 x 107* 0.99877 2.49107 0.99891 
x10 ~=0.99801 4.31107 0.99454 3.28 X 107° = 0.98563 
0.99994 1.571072 0.99522 1.26107? 0.98681 
x10 0.99905 2.49x 107 0.99675 1.79 x 10° 0.99350 
x10 = 0.99997 2.49107? 0.99881 1.97107 0.99742 
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ditions. In particular, it would be useful to identify those parameters 
that remain more or less constant over a particular set of conditions. 
General trends are apparent. As performance degrades, either from 
one talker to another or from one vocabulary type to another, the 
estimates of each of the parameters, h, p,, and po, generally increase. 
Differential trends are harder to detect. No definite conclusions are 
provided in this set of estimates, but some of it is suggestive. 

Table III provides means and standard deviations for each param- 
eter estimate across vocabulary types for each talker and across talkers 
for each vocabulary type. Since there are only three vocabulary types, 
caution should be exercised with respect to statistics over this variable. 
If we use the ratio of the standard deviation to the mean for each 
parameter as an indicator of variability, we find that p,, across 
vocabulary types, has consistently less variability than h or po, with 
ratios generally less than 0.5 for both efficiency and standard error. 
Across talkers, suggestions are somewhat vaguer, chiefly because of 
the especially large variability provided by talker 4. If we disregard 
the estimates for talker 4, which may be justified by the fact that the 
two-way mixture fits are rather poor for this talker, then low variability 
is indicated for the h parameter, and to a lesser extent, for po, as 
shown by the second set of means and standard deviations in the 
table. 

Another general observation that can be made is that the ratio of p; 
to Po is of the order of 100 and generally decreases across vocabulary 
types from Vp to Vw to Vy. 


3.4.2 Model fits to confusion error as a function of threshold, talker, and 
vocabulary type 


We turn now to two-way mixture models of confusion error and 
estimates of the model parameters. Experimental confusion error 
results are shown plotted as a function of vocabulary size in Fig. 5 for 
talker 6, vocabulary type Vw, and seven threshold values. Efficiency 
error is plotted in Fig. 5a and standard error in Fig. 5b. Accompanying 
each curve is a two-way mixture model fit based on eqs. (26) and (27). 
Parameter estimates for efficiency error are presented in Table V. As 
threshold increases so does confusion error, as well as all the model 
parameters, h, pi, and p2. As with recognition error, parameter esti- 
mates for standard error are omitted. However, there is reasonable 
agreement between the parameter estimates obtained from efficiency 
error and standard error with the exception of the lowest threshold 
value, where the data are too sparse for reliable estimation. Above the 
lowest threshold the ratio of p; to p2 remains fairly constant at 
approximately nine. 

Estimates of py derived from average confusion number data and 


20 TECHNICAL JOURNAL, JANUARY 1984 


100 


80 


60 


40 


AVERAGE ERROR PERCENTAGE 





“A 
* — 
0 a—— —<s 
a o 
a Ce ee ee ee ee Aa 
o S 






eS en os en — ET 


100 = = 
(b) [Oo L- 


0.20 nF O 

0.25 7 p 

0.30 3 yr y, 
0.35 b Z 7 
0.40 y 4 3 K 


0.45 4 4 
0.50 y, 4 f 


80 


60 


Opot+enkb 


40 yi, 


AVERAGE ERROR PERCENTAGE 
>" 
\ 
NS 
~ 
N 
As 
\ 
\ 
. N 
N 
N 


4 F CI 
rd Z 
<a. a wer O 


es (eg 0 Sion 
oe ==} } =~ Ate 
Om en Oe ett ert KEE TTL 
10 20 50 100 200 500 1000 2000 
SIZE 


Fig. 5—(a) Average efficiency confusion error, and (b) average standard confusion 
error as a function of vocabulary size, with two-way mixture model fits for talker 6, 
vocabulary type Vw, and seven values of threshold, T. 


from estimates of h, pi, and pz for efficiency and standard confusion 
error are presented in Table VI together with coefficients of fit. Note 
that compared to recognition error results shown in Table IV, the 
coefficients of fit indicate better fits for the confusion models, sustain- 
ing a subjective impression gained by examining the figures. In addi- 
tion there is much better agreement in estimates of Dy between 


SPEECH RECOGNITION 21 


Table V—Model parameter estimates for 
average efficiency confusion error 


T h Pi Pe 
0.20 0.050 4.62 x 1074 1.46 x 1075 
0.25 0.250 5.70 x 1074 6.04 x 1075 
0.30 0.350 1.73 x 1073 1.14 x 1074 
0.35 0.310 5.27 x 1073 5.67 X 1074 
0.40 0.379 1.15 x 107? 1.26 x 107° 
0.45 0.527 1.84 x 107? 2.02 x 107° 
0.50 0.620 3.08 x 107? 3.47 x 107° 


Table VI—Comparison of model fits for average confusion number, 
efficiency confusion number, and standard confusion number 


Average Confusion Efficiency Confusion Standard Confusion 
Number Error Error 
T Dv r Dv r Pv r 
0.20 3.73 X 107° 0.99521 3.70 x 107° 0.99539 4.67X 107° 0.99670 
0.25 2.08 x 107* 0.99952 1.88 107* 0.99910 1.97107 0.99895 
0.30 7.17104 0.99984 6.81 x10“ 0.99971 6.62 10* 0.99950 
0.35 2.1310 0.99994 2.02 10% 0.99981 2.05 x10 0.99951 
0.40 5.2610 0.99999 5.15 x10? 0.99989 5.00 x 107° 0.99956 
0.45 114x107 0.99999 1.06 x10 0.99995 1.01107 0.99964 
0.50 2.24107 0.99998 2.0410 0.99986 1.97 10°? 0.99930 


efficiency and standard error, and between these estimates and the 
estimates obtained from the regression line fits for average confusion- 
number results. This improved agreement is attributed to the fact that 
the ratio of p; to pz is much smaller for confusion than for recognition 
results. The size of this ratio is a good indicator of the disparity among 
the underlying populations that we are attempting to model with two- 
way mixtures. The smaller the disparity, the better is the model. In 
the limit when p, equals pz, indicating a uniform population, a simple 
model containing no mixtures is appropriate. 

Figures 6 and 7 and Table VII present some additional aspects of 
confusion-error models. Figure 6 shows confusion error results for the 
three vocabulary types as a function of vocabulary size, together with 
model fits, with the threshold, T, set to 0.3. Results are shown 
individually for each of three talkers. Efficiency error results are shown 
in Fig. 6a though 6c and standard error results in Fig. 6d through 6f. 
The usual degradation in performance is found passing from Vp to Vw 
to Vy. Model parameter estimates for efficiency error for all six talkers 
and three vocabulary types are presented in Table VII. It can be noted 
that the increase in error rate across these vocabulary types is not 
consistently accompanied by an increase in the value of the parameter 
estimates, as was obtained for recognition error. 

Figure 7 shows confusion error results with model fits for the six 
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Table VII—Model parameter estimates for average efficiency confusion error 


0.350 
0.204 
0.450 
0.060 
0.161 
0.070 


0.216 
0.156 


0.427 
0.355 
0.450 
0.238 
0.371 
0.290 


Vp 


Pi 


3.59 x 1073 
2.84 x 10°? 
1.97 x 1073 
1.94 x 107 
2.50 x 1073 
4.18 x 1073 


2.84 x 1073 
8.99 x 1074 


Mean 


Pi 


6.19 x 107° 
3.73 x 107° 
4.72 X 1073 
1.84 x 1073 
2.71 x 107 
3.20 x 1073 


P2 


1.16 x 10 
4.02 x 107° 
1.10 x 107° 
1.12 x 10% 
1.67 x 10~* 
2.41 x 10~¢ 


1.15 x 10+ 
8.73 x 10° 


P2 


2.94 x 1074 
1.45 x 10-* 
2.32 x 10 
7.54 x 107° 
9.40 x 107° 
2.18 x 10-4 


0.209 
0.200 
0.200 
0.300 
0.400 
0.350 


0.277 
0.087 


0.266 
0.265 
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0.156 
0.197 
0.197 


Vw 


Pi 


8.34 x 1073 
4.00 x 1073 
6.45 x 107° 
9.87 x 10-* 
1.96 x 10° 
1.73 x 1073 


3.91 x 10° 
2.94 x 10° 


P2 


7.29 x 107 
3.64 X 107* 
5.90 x 107* 
4.89 x 10° 
3.07 x 107° 
1.14 x 10 


3.13 x 10 
2.97 x 10-* 


Standard Deviation 


Pi 


2.41 x 1077 
7.87 x 1074 
2.41 x 1073 
8.11 x 1074 
8.69 x 107+ 
1.29 x 10° 


P2 


3.79 x 1074 
1.90 x 107* 
2.13 x 107 
3.28 x 107° 
6.87 x 107° 
9.50 x 1075 


0.723 
0.661 
0.700 
0.353 
0.551 
0.450 


0.573 
0.149 


Pi 


6.65 x 107% 
4.34 x 107% 
5.74 x 107% 
2.60 x 107° 
3.66 x 107-3 
3.68 x 10° 


4.45 x 1073 
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Fig. 6—(a), (b), (c) Average efficiency confusion error, and (d), (e), (f) average standard confusion error as a function of vocabulary size, with 
two-way mixture model fits, for three vocabulary types. 


talkers and the single vocabulary type Vw with the threshold, T, set 
at 0.3. Efficiency error results are shown in Fig. 7a and standard error 
results in Fig. 7b. As with recognition error there is considerable 
variability across talkers, although for confusion error there are no 
prominent extreme individual performances. There is also an apparent 
greater tendency for the error rates to converge for small vocabulary 
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Fig. 7—(a) Average efficiency confusion error, and (b) average standard confusion 


error as a function of vocabulary size, with two-way mixture model fits, for six talkers, 
vocabulary type Vw, and threshold value, T = 0.30. 
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sizes. From Table VII, it is apparent that there is a fairly consistent 
tendency for the parameters p; and pz, but not h, to increase with 
increasing error rate. Means and standard deviations are calculated 
across talkers and across vocabulary types in the tables. Using these 
as indicators of variability, as was done for recognition-error parame- 
ters, it appears that estimates of p, across vocabulary types, and h 
across talkers, have relatively small variability, the same as for rec- 
ognition error. There is also some suggestion that p; across talkers 
also has low variability. 

The foregoing observations, together with similar ones made for 
recognition error, will be discussed in the following section in connec- 
tion with interpretation of the parameters. 


IV. DISCUSSION 


In the preceding section we have shown that both the confusion- 
and recognition-error performance of a recognition system can be 
modeled quite closely by assuming that there is a mixture of types of 
recognition or confusion trials, each type associated with a distinct 
probability for the occurrence of a recognition or confusion error. We 
have shown that, in most cases, assuming a mixture of two population 
types is quite adequate to represent the experimental behavior that 
has been presented, although more than two types might very well 
underlie this behavior. 

A substantial dichotomy of population types is evidenced by a large 
ratio of p; to po, the probability estimates of the two populations. It 
has been found that large ratios, of the order of 100 or 200, are 
generally associated with recognition error, while smaller ratios, from 
10 to 30, generally characterize confusion error. 

Where substantial dichotomies exist, it would be most interesting 
and useful to relate the different population types to actual phenomena 
associated with the speech-recognition process. Unfortunately, the 
experimental results incorporate and merge various sources of recog- 
nition and confusion error, including the talker, the talking environ- 
ment, various aspects of the vocabulary, and the recognizer itself. It is 
difficult, if not impossible, to uniquely characterize these sources in 
the model parameters. Moreover, in these experimental results, trials 
are averaged over all four repetitions of each word and all the words 
in a subset [see eqs. (28) to (33)]. It is therefore not possible to 
uniquely attribute the different types of behavior to phenomena as- 
sociated with repeated utterances of the same word on one hand, or 
to different word types in a subset, on the other. It is reasonable to 
believe, however, that a dichotomy of types for repeated utterances of 
the same word would be more significant for rank or recognition 
performance than for confusion performance. This is because an 
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atypical pronunciation of a word should perturb the self-distance 
distribution, and consequently the rank distribution, far greater than 
the distribution of distances to other words in the vocabulary. In fact, 
in some results not presented here, confusion error was calculated 
from the distances between reference prototypes alone. Only small 
differences were obtained with the confusion error results calculated 
from distances between test utterances and prototypes, which have 
been presented in the previous section. We are therefore led to believe 
that differences in populations of trials for confusion error are mostly 
associated with different types of words, rather than different types of 
pronunciations of words. So we speculate that for confusion error 
there are two (or more) populations of words with confusion probabil- 
ities differing by ratios of 10 to 30; whereas, for recognition error there 
are two (or more) populations of trials with recognition probabilities 
differing by ratios of 100 to 200, where large discrepancies in self- 
distance distributions for repeated utterances of words are superim- 
posed on population differences among words. 

There are two other possibilities for making educated speculations 
associating model parameters with various phenomena in the recog- 
nition process. First, we can examine trends as an experimental 
parameter such as vocabulary type or distance threshold varies. For 
example, in the previous section note was taken of which model 
parameters vary little over the range of some experimental variables. 
Second, some clues might be obtained if small or large subset approx- 
imations of the model error formulations isolate one model parameter 
from the others. In anticipation of this possibility, some derivations 
of small and large subset approximations are provided. 


4.1 Small and large subset size approximations 


Small subset size approximations for the error formulations are 
obtained by assuming that for small p and N, (1 — p)% can be 
represented by the first few terms in the binomial expansion. Rewriting 
eq. (26) for M = 2 we obtain 


Bley) =1- hE PP gy) ee (35) 
where, as in Section 3.3, we assume ); > Po. Using the approximation 
(1- p)"=1-Np+ 74 ps (36) 

for small N, we obtain 
Bley} = + (hp, + (1 — h)p2) = vat Pv. (37) 








2 2 
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Similarly, eq. (27) rewritten for M = 2 is given by 

S{Ey} =1— h(1 — pi)X* — (1 — A) = po). (38) 
Approximating (1 — p)‘~! by 1 — (N — 1) p, we obtain 

{Ey} = (N — 1)[hp. + (1 — h)p2] = (N — 1)pv. (39) 


Thus, for small N, expected error grows linearly with N just as 
expected rank or confusion number does for all N [see eq. (23)]. In 
fact, the approximation for standard error is identical to eq. (23). Also, 
for these small N formulations, the expected value of efficiency error 
is just one half the expected value of standard error. This can be 
observed in the experimental results as pointed out in Section 3.1. It 
is easily verified from the basic expressions for efficiency and standard 
error found in eqs. (12) and (14) for N = 2. 

For large subset size approximations we might assume that (1 — p)% 
= 0 for large N. Then we can approximate the efficiency error for- 
mulation, eq. (35), by 


lfh 1-h 
ww z1-—-—(-c+t+ j 40 
{ey} i(4 Do (40) 





A comparable approximation does not exist for standard error. How- 
ever, it is possible to approximate (1 — p)* by e% for small p and 
moderate pN. This can be introduced in eq. (39) to obtain 


BiEy} = 1 — he NYP — (1 — hye AN VP2, (41) 


which is a potentially useful approximation. 

For these small and large vocabulary size approximations to be 
useful in providing interpretations for the model parameters, condi- 
tions must exist for one or the other of the population types to 
dominate. For small N, since we have assumed p; > po, for type 1 
populations to dominate in eqs. (87) and (39) we should have 


I Silos (42) 


1—h ps 


Conversely, for type 2 populations to dominate in eq. (40), we should 
have 


l-hp 


> 1. 43 
in as (43) 


4.2 Small and large subset approximations and the relation between rank 
and confusion error 


These small and large N formulations for error coincide with our 
earlier discussion in Section 3.1 on the relation between confusion and 
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rank or recognition error. There we found that for small N, recognition 
error is approximately equal to confusion error at a threshold for 
which average confusion number equals average rank number. For 
large N we found that recognition error approximates confusion error 
at a threshold equal to average self-distance. Examining these rela- 
tionships once more with respect to model parameters, we see that for 
small N, from eq. (37) or (39), the threshold for equality should be set 
such that p,,(7) = Dry. (The subscripts are used to differentiate be- 
tween confusion and rank.) From eq. (23) we know that py is the slope 
coefficient for expected rank or confusion number, so our earlier 
hypothesis for small N is confirmed. For the example of talker 3 and 
vocabulary Vy used in Section III, Pry, 18 6.67 X 10° from Table I. 
From the same table, we see that for p,,, 7) to have this value, T' should 
be between 0.35 and 0.40. 

For large N, from eq. (40), assuming eq. (43) holds, equal errors 
should be obtained if 1 — h/p2 is the same for both confusion and 
rank. Again for the same example, for rank or recognition, 1 — h/p2 = 
6 x 10°. Although confusion model parameter estimates are not shown 
for talker 3 as a function of threshold, for a threshold of 0.235, 
corresponding to average self-distance, they are approximately 0.04, 
5.3 X 107°, and 1.8 x 10~*, for h, pi, and pe, respectively. Thus, 1 — 
h/p2 = 5.3 X 10°, which agrees well with the value obtained for rank. 
Thus, the model formulations and parameter estimates support the 
original hypothesis for large N as well. For large N the density of 
words for a given vocabulary type is great enough so that even though 
the distribution of rank numbers and confusion numbers for a thresh- 
old set to average self-distance is not the same, the proportion of zero 
and nonzero rank and confusion numbers, which correlates well with 
both kinds of error, is about the same. In the discussion that follows, 
we will conjecture that the parameters of the large N formulation, h 
and pz, are largely associated with vocabulary type which should be 
the major factor controlling density. 


4.3 Small and large subset size approximations and dominance of 
population types 


Now let us examine some of the experimental results to determine 
to what extent eq. (42) or (48) holds. For recognition error, in Table 
III, we find that generally high ratios of p; to p2 are to a large extent 
offset by small values of h/1 — h. Thus, although the value of the 
expression in eq. (42) is nearly always greater than one, it is not 
consistently greater than 10, a value for which we could say unequiv- 
ocally that type 1 populations dominate small vocabulary size behavior. 
Those instances in which the expression assumes values less than 10 
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are associated with low error rates, for example, for talkers 1 and 2. 
Quite the opposite is true for large size behavior, since in eq. (43), 1 — 
h/h is always greater than one and the ratio of p; to pz remains large. 
Thus large size behavior is consistently dominated by type 2 popula- 
tions in eq. (40), and also in eq. (41), as is easily verified. 

For confusion error, with the threshold fixed at 0.3, the results in 
Table VII indicate that although the ratio of p; to pe is smaller than 
for recognition error, the value of h/1 —h is generally greater. Conse- 
quently, overall, the expression in eq. (42) assumes about the same 
range of values as for recognition error. Similar observations are made 
for large size behavior in both confusion error and recognition error. 

To the extent that type 1 populations control small vocabulary 
behavior and type 2 populations control large vocabulary behavior, it 
is natural to associate type 1 populations with trials or words that are 
chronically “bad” in some sense, and type 2 populations with the 
“natural” density of a particular vocabulary type. Thus, type 1 errors 
persist when alternate choices are few and the vocabulary size is small, 
while the natural density of words in the vocabulary must be important 
when the vocabulary size is large. By this hypothesis we should expect 
that good performance associated with low error rates should have 
only a weak dominance of type 1 trials or words, since there should be 
fewer bad words or trials. This is, in fact, what is observed. The 
hypothesis is also in agreement with the observations made earlier in 
this section on the dichotomy of populations for recognition and 
confusion error. 


4.4 Experimental variability of model parameter estimates and parameter 
origins 

We can now stretch further our speculations on the origins of model 
parameters by recalling our earlier observations of their relative vari- 
ability across the experimental variables we observed. We assume 
three sources of error, the talker, the vocabulary, and everything else 
which we lump into the recognition system. For confusion error, p; 
was observed to have low variability across vocabulary types, and to a 
lesser extent, across talkers. Therefore, we could associate p;, the type 
1 probability, largely with the system. For recognition or rank perform- 
ance, Pp; has low variability only across vocabulary types, and is 
therefore associated with both talker and the system. This reflects the 
effect of self-distance distribution, which is clearly talker dependent. 
The type 2 probability, p2, was observed to have low variability across 
talkers for recognition or rank performance (except one talker). Al- 
though a similar observation was not made for confusion performance, 
it is natural to associate p. with vocabulary type and the system. This 
hypothesis is compatible with the vocabulary density role associated 
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with pz earlier in our discussion. Finally, h, the mixing coefficient for 
the two types of populations, was observed to have low variability 
across talkers for both confusion and recognition performance. We 
are therefore led to believe that h is largely a function of vocabulary 
type, with the role of the system unclear. 


V. CONCLUSION 


The data extracted from a series of isolated word recognition exper- 
iments with large vocabularies have enabled us to hypothesize and 
verify a simple probabilistic model underlying performance of recog- 
nizers. Essentially, we have attempted to model the distributions of 
confusion number, an a priori characterization of a recognizer, and 
rank number, an a posteriori characterization. Expressions have been 
derived for three confusion or rank number functions, average confu- 
sion or rank number, and two error functions, standard error and 
efficiency error. Models have been evaluated and interpreted using 
experimental values of these functions. The difference between stand- 
ard error and efficiency error has been described and an attempt has 
been made to describe and interpret the difference between confusion 
and rank performance. 

It is significant that good models for performance are obtained only 
by assuming a mixture of probability distributions as the basis. The 
reduction of the performance of a recognition system over a large 
range of vocabulary sizes to as little as three parameters enhances our 
understanding of the processes involved and has some potential prac- 
tical utility in the evaluation of systems. Over the range of experimen- 
tal variables available in this series of experiments we have been able 
to speculate on associations of the model parameters with variables in 
the recognition process. To place these suggestions on firmer footing 
will require additional experimental data. For example, useful data 
can be obtained from a large number of repeated utterances for a given 
talker and vocabulary in order to attribute behavior differences 
uniquely to the different words in a vocabulary or to repetitions of the 
same words. Examining results obtained by passing the same utter- 
ances through different recognizers, or systematic variations of the 
same recognizer or recording environment, will also be revealing. The 
use of a larger number and more sharply distinct vocabulary types will 
also provide useful information. In addition it is important to devise 
experiments to evaluate the predictive power of the models. Thus, 
once the parameters of a model have been estimated, new experimental 
data obtained with controlled variation of experimental variables 
should be consistent with the model. 
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Digitized speech can be transmitted over a variety of digital media. An 
interesting choice is the use of a Local-Area Network (LAN), for which 
digitized speech is packetized at the transmitter and depacketized at the 
receiver. Many local-area networks exhibit good throughput but poor delay 
characteristics; variable or excessive transmission delay can become noticeable 
and objectionable to the users of such a voice system. A number of simulations 
were performed to assess the delay characteristics of a Carrier Sense Multiple 
Access/Collision Detection (CSMA/CD) LAN and of a similar token bus 
LAN. A comparison of the results shows that the token bus performs somewhat 
better. The CSMA/CD LAN’s performance was characterized by carrying 
voice well until a point of collapse is reached; the token bus’s performance 
degraded more continuously. In either case, throughput close to the theoretical 
capacity of the LAN was found achievable with appropriate techniques. 


I. CHARACTERISTICS OF DIGITAL VOICE 


Human speech of telephone quality can be easily encoded into a 64- 
kb/s bit-stream containing 8000 8-bit speech samples per second; 
although much more efficient encodings are possible, this synchronous 
64-kb/s speech encoding is assumed throughout this paper.’ Speech 


* AT&T Bell Laboratories. 

t As a simple improvement, the use of delta-modulation to transmit only the differ- 
ences between successive samples could produce savings of approximately 2:1. More 
extensive processing could result in more extensive savings by taking further advantage 
of the regular properties of human speech. Improvements in the encoding bit rate will 
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Table I—Speech sample delay breakdown 


Type of Delay Consisting of 


Fixed The (nominal) temporal The delay after the packet is acquired 
length of the packet: the until the packet is completed and trans- 
packet size measured by its mitted (the temporal length of the por- 
acquisition time tion of the packet following this sample) 


Plus the delay after the packet is received 
until the sample is played back (the tem- 
poral length of the portion of the packet 


preceding this sample) 
Plus much smaller fixed delays (e.g., the transmission time) 
Variable The delay in transmitting The delay in obtaining the transmission 
the packet medium 


Plus (typically) smaller variable delays 


consists of talkspurts separated by silences: a speaker in a typical 
conversation talks about 40 percent of the time and is silent for the 
remainder, and an approach that transmits speech only during talk- 
spurts can therefore be desirable. Silences, of course, are relative. 
Ideally, no speech should be lost by being considered silence, and no 
extraneous background sounds should intrude during silences. This 
ideal can be approached through the use of cutoff levels with memory. 

Transmitting digital speech over a shared packet network entails 
packetizing the digital signal at the transmitter, transporting it over 
the network, and depacketizing it at the receiver; these operations can 
introduce delay. Table I gives a high-level breakdown of the delay in 
the transmission. The delay includes a fixed component and a variable 
component. Because of the variable component, if the receiving station 
begins playing a packet as soon as it is received, this can introduce 
artificial silences at some points (when a packet is delayed more than 
the one before it) and lost speech at others (when a packet is delayed 
less than the one before it), ultimately producing effects audible to the 
users. This variability of performance can be partially overcome by 
artificially delaying packets at the receiver, such that only those 
packets whose variable delay is greater than some threshold will cause 
anomalies; since speech is inherently real-time, arbitrary queueing of 
packets at the transmitter or receiver is not possible. 

The user-level model of speech used in this paper is that of a typical 
two-way conversation, in which real-time constraints exist at both 


result in improvements in the performance figures presented in this paper, but these 
performance improvements will typically not be linear, since a reduction in the bit rate 
will make other factors relatively more important. Similarly, although variable bit-rate 
encodings can produce further savings over fixed bit-rate encodings, they can lose many 
of the advantages shown for fixed bit rates in this paper, and will again have less of a 
total impact than might otherwise be expected. 
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ends. If either side of the voice conversation were to be a computer or 
similar device, knowledge of this fact could be used to ease the 
constraints somewhat, although this optimization is not considered in 
this paper. If both ends were known to be computers, speech could 
then be transmitted as a nonreal-time data transfer. 

If packets are artificially delayed, the one-way voice sample delay 
from the transmitter to the receiver during successful transmission 
will roughly equal the packet size plus the threshold delay. Increasing 
the packet size will increase the effective bandwidth of the system (by 
reducing per-packet overhead); increasing the artificial delay will 
reduce the incidence of anomalies (by reducing the probability that a 
packet will have been delayed for longer than the threshold). Reducing 
the traffic speeds access to the shared network; reducing anomalies 
postpones the onset of overload. On the other hand, increasing these 
values increases the delay through the system, which will eventually 
become perceptible to the user; this suggests a compromise between 
the extremes. For example, the one-way delay on a single-hop syn- 
chronous-orbit satellite voice circuit is 270 ms, which many users view 
as disruptive; the double-hop delay of 540 ms is considered much 
worse. Considering that a system built on one LAN may frequently 
communicate with another system on another LAN (thereby at least 
doubling the end-to-end delay), this suggests that the one-way delay 
on a given LAN should be kept well below 270/2 = 135 ms. An 
alternative might be to treat inter-LAN connections differently from 
intra-LAN connections; this possibility is not considered here. In any 
case, the delay cannot be allowed to grow without bound. It should be 
noted that echo is perceived as being much more disruptive than 
simple delay, with the audible threshold occurring much earlier, but 
echos, where they might occur, can be controlled through the use of 
echo cancelers. The exact nature of this compromise depends upon 
the precise psychoacoustic characteristics of the importance of this 
delay compared to, say, the effect of the anomalies caused by variable 
delay; this trade-off is not well understood. 


Il. A TYPICAL CSMA/CD LAN 


Ethernet* is a typical Carrier-Sense Multiple Access/Collision De- 
tection (CSMA/CD) LAN.’ Data packets are transmitted bidirection- 
ally over a coaxial cable with an acyclic branching topology. Access to 
the net is distributed (“multiple access”) and statistical. A station 
wishing to transmit first listens to determine whether the net is in use 
(“carrier sense”); if it is, the station defers until the current user has 
finished transmitting its packet. If the net is not in use, the station 


* Ethernet is a trademark of Xerox Corporation. 
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begins to transmit. Due to race conditions, two stations could begin to 
transmit simultaneously; when one station notices another transmit- 
ting (“collision detection”), it aborts its transmission, jams the net to 
ensure that other stations also notice the collision and abort their 
transmissions, and retries after a random amount of time, thereby 
statistically avoiding recollision. 

An Ethernet CSMA/CD network is bit serial and runs at 10 
Mb/s ; a bit-time is thus 0.1 us. Assuming 64-kb/s speech, complete 
utilization of the bandwidth would result in carrying up to 195.3 two- 
person conversations (in which each person spoke 40 percent of the 
time). Such efficiency, however, can never be achieved in practice. 

One reason is simple per-packet overhead. A transmission on an 
Ethernet CSMA/CD network begins with 64 sync bits, followed by the 
packet. A packet contains 112 bits of header, a 368- to 12,000-bit data 
field (thus between 5.75 ms and 187.5 ms of 64 kb/s speech) and a 32- 
bit CRC field. A station may begin to transmit when it has seen the 
net idle for 96 bit-times. Assuming (arbitrarily) that voice stations are 
uniformly distributed along a maximum-length linear CSMA/CD net- 
work, computations based on the Ethernet propagation delay budget 
give a worst-case mean one-way propagation time of about 10.06 us; 
we can expect an arbitrary station to see the net go idle 100.6 bit- 
times after the arbitrary preceding station actually ceased to transmit. 
A linear CSMA/CD network is in ways a “best case,” since the mean 
distance between stations will be less than in a more general topology. 
However, the limiting case in complex topologies is extremely unlikely. 
Similarly, uniform distribution is a “best case,” but a more accurate 
characterization seems difficult to achieve. 

Taking per-packet overhead into account, we see that, at the small- 
est packet size, the speech samples can occupy only 47.6 percent of 
the bandwidth, allowing a maximum of 93.0 conversations; at the 
largest packet size, 96.7 percent of the bandwidth can be speech 
samples, allowing 188.9 conversations. 

Studies of the Ethernet specifications under varying load conditions 
have typically shown Ethernet CSMA/CD networks to have very 
desirable throughput characteristics (e.g., see Ref. 2). Throughput 
tends to rise linearly with offered load until saturation is approached, 
and then levels off, with an asymptotic throughput within a few 
percent of maximum for large packets and within several percent for 
small packets. (For an experimental Ethernet CSMA/CD network 
described in Ref. 2, whose numerical parameters differed significantly 
from those of the specifications discussed here, measured throughput 
reached 96 percent for maximum-size packets, and 83 percent for 
minimum-size packets. The traffic in this study, as in the case consid- 
ered here, was produced by a number of stations each offering a 
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fraction of the total load.) Throughput may decrease under certain 
cases of extreme overload (for example, two stations each attempting 
to offer 100-percent load to the net would ultimately transmit less 
data together than either would individually, due to their contention), 
although this decrease evidently does not become pathological. 

On the other hand, individual packets may experience significantly 
greater delays under heavy loads than under light loads. The nature 
of this increase in the delay has not been well characterized in past 
studies of data traffic, which is not as badly affected by variable delays 
as is real-time voice. Although we can be reasonably certain that the 
voice samples will make the journey from the transmitter to the 
receiver, almost up to the physical transport limits of the network, 
this might be inadequate if they require excessive time to do so. 


Ill. A SIMULATION STUDY 


To determine the performance characteristics of voice traffic on a 
CSMA/CD network, a computer simulation was prepared based on 
Ethernet specifications. The stations were assumed to be uniformly 
distributed over a maximum-length network. Both voice and data 
traffic were modeled. 

The voice stations modeled typical two-person conversations. The 
stations were therefore paired, with an appropriate distribution and 
correlation of talkspurts and silences (adapted from Brady, as dis- 
cussed in Ref. 3). Brady’s study included filtering out very short 
silences and very short talkspurts, thereby increasing the mean length 
of silences and talkspurts and otherwise modifying their distribution. 
The exact type of voice filtering best suited for transmission over an 
LAN is still uncertain. Voice packets were not transmitted during 
silences. The simulations began with one conversation, after which an 
additional conversation was added every 0.5 second of simulated time, 
until the system had passed saturation. This staged introduction of 
conversations helps to eliminate anomalies associated with the start- 
up of several conversations at once. Although it is possible that the 
monotonically increasing number of conversations could produce his- 
tory artifacts in the simulation results, none were observed in the 
CSMA/CD simulations; these did occur in the token bus studies 
outlined in Section IX. 

Figure 1 shows the actual number of speakers over time, as a 
function of the number of conversations, for one particular voice 
traffic pattern; this pattern and another like it were used throughout 
the simulations to control the effect of differing traffic patterns in 
separate simulations. (As it turned out, the effect of a particular voice 
traffic pattern on observed behavior was less than anticipated, and is 
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Fig. 1—Number of simulated speakers in a voice traffic pattern. This graph presents 
a voice traffic pattern used in most of the simulations presented in this paper, plotting 
the number of instantaneous speakers as a function of the number of conversations. 
One simulated two-person conversation is added every 0.5 second of simulated time; 
there are 0.8 expected instantaneous speakers per conversation. Speakers divide their 
time between talkspurts and silences. This simulation uses an empirically derived 
distribution of the lengths of talkspurts and silences and of the correlation between the 
states of the two potential speakers in a conversation. 


easily compensated for. Thus, the use of the same voice traffic patterns 
throughout the simulations seems to have been unnecessary.) 

The data stations presented a bimodal distribution of packet lengths, 
typical of data traffic on real nets, with 80-percent minimum-size 
packets and 20-percent maximum-size packets (giving approximately 
the opposite distribution when weighted by length). The packet ar- 
rivals were modeled by a Poisson process: the traffic generated by a 
Poisson process is not as bursty as real data traffic, but the difference 
was expected to be relatively unimportant in determining the effect of 
the data traffic upon the voice traffic. The simulations included 
between 0- and 10-percent steady data loading of the system, the latter 
value being well beyond the measured steady loadings of current 
Ethernet CSMA/CD networks. 
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IV. VOICE TRANSPORT ALGORITHMS 


The simplest algorithm for transmitting voice would be to packetize 
the digital speech, dropping packets containing only silences, and to 
send them to an autonomous network interface to be transmitted 
asynchronously. The simplest algorithm for receiving voice would be 
to receive packets asynchronously from an autonomous network in- 
terface, and begin to play back the first packet of a talkspurt after 
some artificial delay, with subsequent packets of the talkspurt each 
immediately following its predecessor. 

An important improvement on the transmission algorithm at the 
source deals with the case when packet transmission must be delayed 
until the net can be acquired. If, while the packet is waiting to be 
transmitted, more speech samples are being buffered, these can be 
appended to the old packet before it is transmitted instead of being 
used to start a new packet. This approach has three advantages: 

1. It tends to transmit fewer packets under a heavy load, thereby 
applying a degree of negative feedback. 

2. The varying length of a packet serves as a sort of time-stamp. 
Since the last speech sample in the packet was collected just before 
the packet was successfully transmitted, this allows the receiver to 
determine the exact age of the first speech sample, allowing more 
precise control over packet playback. 

3. It produces an adaptive effect. In the simplest case, each station 
will begin to attempt to transmit a new packet one packet-time after 
beginning to attempt to transmit the previous packet; here, though, 
this will occur one packet-time after the last packet was successfully 
transmitted. In the first case, if two stations happen to collide with 
each other once, they will then collide with each other every packet- 
time afterwards until one of them begins a silence; in the second case, 
a collision once resolved creates a phase shift that persists thereafter. 

At the receiver, we buffer packets to cope with their variable delay. 
If the speech samples are implicitly time-stamped by the variable 
packet size, it is possible to correct for the delay that the first packet 
of a talkspurt has already experienced in transmission. 

The receiver implementation can be quite simple. The voice path is 
implemented as a first in first out (FIFO) buffer: packets are inserted 
as they are received while samples are extracted synchronously. The 
first packet of a talkspurt is preceded in the FIFO by the appropriate 
amount of artificial silence; the beginning of a talkspurt can be 
detected by the FIFO being empty. This scheme is easily extended to 
the case of connections with more than one other speaker, with 
multiple independent speech sources being merged together; each 
speaker is assigned a separate FIFO and summing is performed on the 
outputs of the FIFOs. The FIFOs can be implemented in hardware or 
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software. 

Samples that are too late are discarded; they will have been preceded 
by an artificial silence. Excessive delays can result in packets that are 
longer than the FIFO, in which case part of the packet can be 
discarded. For every amount of artificial silence we accidentally intro- 
duce, we lose an equivalent amount of speech, except when the last 
samples of a talkspurt are delayed excessively, in which case the 
artificial silence before playing them back is matched by losing part 
of the real silence elsewhere. With proper matching between trans- 
mitter and receiver, it is possible for the transmitter to predict which 
samples the receiver would discard, and simply not transmit these in 
the first place, thereby reducing net traffic under heavy load and 
avoiding a potential instability. 


V. BASIC CSMA/CD PERFORMANCE 


Simulations were performed to measure the voice capacity and 
related characteristics of CSMA/CD LANs. For a simulation in which 
voice stations used (nominally) minimum-size voice packets (5.75 ms), 
and in which there was no data traffic, Fig. 2 shows the transmission 
delays that voice packets experienced. Note that the delay is essentially 
zero (i.e., less than the quantizing sample time of 125 ys) until the 
equivalent of approximately 60 conversations is reached, at which 
point the delay rises roughly linearly. [While the expected number of 
speakers at an arbitrary point in the simulation is 0.8 times the number 
of conversations, the actual number of speakers will vary from this, 
depending on the details of the traffic pattern. We define the effective 
number of conversations at a point in time as the actual number of 
speakers divided by 0.8 (the expected value of the effective number of 
conversations is the actual number of conversations). It was found 
that much smoother graphs were obtained by plotting transmission 
performance using the effective number of conversations rather than 
the actual number, and that the curves were thereby made much more 
similar across different traffic patterns. Most of the graphs in this 
paper are based on effective conversations rather than actual conver- 
sations; they may be converted to actual conversations by the addition 
of appropriate axial randomness.] Note that the standard deviation is 
several times the mean, due to the long tail of the distribution of 
delays; this is illustrated in Fig. 3, which shows the distribution of 
delays at a 50-conversations loading. 

If we set a threshold of an additional 5.75-ms artificial delay of voice 
samples, we can bring the total delay through the system to 11.5 ms. 
(Given a desired total delay of 11.5 ms, it would be possible to allocate 
less of the total to variable delay and more to packet size; the reverse 
would also be possible. An extreme position in either direction can be 
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Fig. 2—CSMA/CD delay for 5.75-ms voice packets in the absence of data. This graph 
shows the mean and standard deviation of the delay experienced in the transmission of 
(nominally) 5.75-ms (i.e., minimum-size) voice packets in the absence of data traffic, as 
a function of the number of effective conversations. Note that both the mean and the 
standard deviation are essentially zero (i.e., less than the quantizing sample time, 125 
us) until about 60 effective conversations are reached, at which point they grow roughly 
linearly (distorted here by the logarithmic vertical scale) and become quite large; the 
standard deviation far exceeds the mean. 


counterproductive, so an equal division is not totally unreasonable. 
However, as will be shown later in this paper, it seems more optimal, 
for CSMA/CD networks, to allocate significantly more delay to packet 
size than to variable delay.) At this delay we can expect to transmit 
up to about 60 conversations well, and to lose some speech samples 
past that point, as shown in Fig. 4. Here, the vertical axis measures 
the percentage of speech samples lost, which roughly models the 
degradation of the channel. Further study is needed to determine the 
effect of other parameters of the artificial silences and loss speech 
upon human users: for example, the number and length of these 
anomalies are probably important. 

As a test of validity, the results of these simulations (as well as the 


PACKET SPEECH TRANSMISSION 41 


CUMULATIVE DISTRIBUTION 





0 02 0406 1 2 4 6 10 20 40 +60 100 200 
DELAY IN MILLISECONDS 


Fig. 3—CSMA/CD cumulative distribution of voice packet transmission delays for a 
loading of 50 conversations and 5.75-ms packets. This graph shows the cumulative 
distribution of the variable transmission delays experienced over a period of 0.5 second 
when the simulated CSMA/CD network was loaded with 50 conversations and no data 
traffic. The vertical axis is exponential; the horizontal axis is logarithmic. We see that 
about 65 percent of the packets experienced no delay, that over 99 percent were 
transmitted in less than 125 us (the quantum phase shift possible using the adaptive 
algorithm), and one took over 4 ms. The shape of this curve causes the standard 
deviation to exceed the mean, as shown in Fig. 2. 


ones following) were compared to a previous study of voice transmis- 
sion on CSMA/CD networks’; the results were found to correspond 
closely. 

As an example of the importance of the adaptive nature of the 
variable packet-size algorithm, Fig. 5 shows the effect of using fixed 
packet sizes; we see that the channel degrades much sooner. 


VI. EFFECT OF DATA TRAFFIC ON CSMA/CD CAPACITY 


As we have seen, the natural synchronous nature of voice traffic in 
conjunction with an adaptive algorithm enables the transmitters, in 
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Fig. 4—-CSMA/CD voice channel degradation with increasing load with 5.75-ms 
packets and 5.75-ms artificial delay. This graph shows the voice signal degradation 
experienced on a simulated CSMA/CD network with 5.75-ms packets and an additional 
5.75-ms artificial delay at the receiver. Two simulations with different voice traffic 
patterns were performed and their results superimposed. Degradation is measured as 
the percentage of voice samples that are discarded (here at the transmitter). There is 
no degradation until about 58 effective conversations, soon after which the degradation 
rises roughly vertically: the network is saturated and each new conversation causes a 
conversation’s worth of speech samples to be lost. 


effect, to slot themselves and thereby interfere only minimally with 
each other. As the traffic increases, though, talkspurts begin to arrive 
faster than they can settle in and this structure begins to disintegrate. 
It is therefore to be expected that the addition of data traffic, with its 
inherent asynchronous nature, will interfere with the voice traffic 
more than its share, so that the addition of some amount of data 
traffic will eliminate more than an equivalent amount of voice traffic 
capacity. 
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Fig.5—CSMA/CD voice channel degradation with increasing load with 5.75-ms 
packets and 5.75-ms artificial delay, with a nonadaptive algorithm. This graph shows 
the same relation between simulated network load and voice channel degradation as 
does Fig. 4, except that it uses a simple nonadaptive transmission algorithm. As we see, 
the expected performance of such a system is significantly poorer than one with the 
adaptive algorithm, in regard both to the point at which degradation begins and the 
point at which the curve becomes essentially vertical. 


This phenomenon does in fact occur. Figure 6 shows the delay 
experienced by voice packets on an CSMA/CD system with 5-percent 
data loading. Note that there is no longer any region of essentially 
zero delay, as in Fig. 2 without data loading, and that the knees of the 
curves, although significantly less well-defined here, certainly occur 
more than 5 percent sooner than earlier. Figure 7 shows the channel 
degradation allowing 5.75-ms buffering at the receiver. 

Additional simulation results, not shown here, were obtained for 10- 
percent data loading; they basically extend this trend. 


44 TECHNICAL JOURNAL, JANUARY 1984 


100 


o = MEAN DEVIATION 


60 
x = STANDARD DEVIATION 


40 


20 


= 
Oo 


DELAY IN MILLISECONDS 
- o 





0 20 40 60 80 100 120 140 160 180 
EFFECTIVE CONVERSATIONS 


Fig. 6—CSMA/CD delay for 5.75-ms voice packets with 5-percent data loading. This 
graph shows the effect of a 5-percent data traffic loading on the mean and the standard 
deviation of the voice packet delay on the simulated CSMA/CD network; it should be 
compared with Fig. 2, where no data loading was assumed. Notice that there is no longer 
any large region of essentially zero delay, and that the 5-percent data loading has shifted 
the curves to the left far more than 5 percent. 


VII. INCREASING THE DELAY IN A CSMA/CD SYSTEM 


We can increase the effective bandwidth of the system by increasing 
the packet size at the transmitter or by increasing the variable delay 
threshold at the receiver. If we choose a relatively large value of 50 ms 
for each, resulting in a 100-ms total delay through the system, we find 
that, as shown in Fig. 8, about 150 effective conversations can take 
place on the Ethernet CSMA/CD network in the absence of data. 
Assuming 5-percent data loading reduces this number to about 125 
effective conversations, as shown in Fig. 9. 

It seems likely that the point of diminishing returns has been 
reached at the 50-ms level; further increases in the packet size or 
receiver delay cannot produce any great increase in the capacity of the 
network, but they could subjectively degrade the channel by increasing 
its delay. 
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Fig. 7—CSMA/CD voice channel degradation with increasing load with 5.75-ms 
packets and 5.75-ms artificial delay, with 5-percent data loading. This graph shows the 
signal degradation on simulated voice channels over a CSMA/CD network in the 
presence of 5-percent data loading; it should be compared with Fig. 4, in which there 
was no data loading. Again, two simulations were performed. Degradation rises signifi- 
cantly earlier than with no data loading; there is more than a 5-percent degradation in 
the effective bandwidth. The effect of a 5-percent data loading on a system with a 
nonadaptive fixed packet size (not shown here) is comparatively less, since it does not 
take advantage of the synchronous nature of the voice packets. 


Vill. A TOKEN BUS 


An additional simulation study was performed to determine the 
suitability of a token-passing LAN for carrying voice. In a token- 
passing LAN, contention is resolved through use of a conceptual 
circulating token. A station may transmit only if it has possession of 
the token, and must then pass the token to the next station in logical 
sequence. A token ring is a token-passing LAN with a physical ring 
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Fig. 8—CSMA/CD voice channel degradation with increasing load with 50-ms pack- 
ets and 50-ms artificial delay. This graph shows the signal degradation experienced on 
simulated voice channels over a CSMA/CD network with 50-ms packets and an 
additional 50-ms artificial packet delay at the receiving station; it should be compared 
with Fig. 4, which assumes smaller numerical values. As in Fig. 4, two simulations were 
performed and their results superimposed. We see that a large increase in the delay 
through the system can produce a significant increase in its effective bandwidth. 


topology; the logical sequence is typically the same as the physical 
sequence of stations on the ring. A token bus is a token-passing LAN 
with a physical bus topology (linear or acyclic branching); the logical 
sequence can often be arbitrary but is most efficient if it corresponds 
to the physical sequence. 

A token bus was chosen as the token-passing LAN most directly 
comparable with a CSMA/CD LAN, and the numerical parameters of 
the token bus were chosen to be as similar as possible to those of the 
Ethernet specifications and the choices of the above CSMA/CD sim- 
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Fig. 9—CSMA/CD voice channel degradation with increasing load with 50-ms pack- 
ets and 50-ms artificial delay, with 5-percent data loading. This graph shows the signal 
degradation experienced on simulated voice channels over a CSMA/CD network with a 
long delay through the system, in the presence of 5-percent data loading; it should be 
compared with Fig. 8, in which there was no data loading. Again, two simulations were 
performed and their results superimposed. Note that the proportionate drop in effective 
bandwidth caused by the data traffic is much less than when small delays were 
considered, as in the difference between Figs. 4 and 7. 


ulations. These choices are quite possibly far from optimal for a token- 
passing LAN, but they allow for a simple comparison with the CSMA/ 
CD results; there is no typical design for token-passing systems that 
corresponds to Ethernet among CSMA/CD systems. 

The simulated token bus LAN has a single token circulating; when 
a station receives the token, it either transmits a packet, which 
implicitly passes the token to the next station in logical sequence, or 
it transmits an abbreviated packet, containing only a header, to 
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explicitly pass the token. The bus is linear, of maximum Ethernet 
length, with stations uniformly distributed. No attempt is made to 
match the token-passing sequence to the physical sequence of stations 
on the bus, or to model the (small) control traffic needed to expand 
the sequence when new conversations, and their associated stations, 
are added. 

The voice packet delays in the token bus simulation are shown in 
Fig. 10. [To understand the strange shape of the curves in Figure 10, 
consider a simplified case. We transmit (nominally) 5-ms packets. 
There are enough stations in the ring for the token to require 2 ms to 
circulate in the absence of any transmissions. Assume that for each 
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Fig. 10—Token bus delay for 5.75-ms voice packets in the absence of data. This 
graph shows the mean and standard deviation of the transmission delay on a simulated 
token bus with no data; it should be compared with Fig. 2, which shows the corresponding 
delay for a CSMA/CD network. The mean delay is never less than for the CSMA/CD 
case; the standard deviation under heavy load is much less than for the CSMA/CD case 
but greater under light load. The sawtooth shape of the curves show that these figures 
are nonunique and depend on the transmission history. The horizontal axis measures 
actual conversations instead of effective conversations since even silent stations take 
part in token circulation. 
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active station (one associated with an active speaker) to transmit its 
5-ms packet each time around would require an additional 4 ms total. 
An active station will be ready to transmit 5 ms after it has last 
transmitted, but if every station transmits every time around, the 
token will take (over) 6 ms to circulate, and so every station will 
experience the same (over) 1-ms delay; this can remain as constant as 
the load on the net. On the other hand, if the token were circulating 
faster, so that it needed only 4 ms for its transit, then a station would 
transmit only every other time around and experience a delay of 3 
(4+4-5) ms. If stations transmitted only every other time around, the 
time needed under the original assumptions for a token cycle will be 
2+4/2 = 4 ms, as assumed; this shows that the performance of a 
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Fig. 11—Token bus cumulative distribution of voice packet transmission delays for 
a loading of 50 conversations and 5.75-ms packets. This graph shows the cumulative 
distribution of the variable transmission delays experienced on a simulated token bus 
loaded with 50 conversations and no data traffic; it should be compared with Fig. 3, 
which shows the equivalent case for a simulated CSMA/CD network. We note that 
almost all packets are delayed essentially the same amount of time, which reflects an 
essentially constant token circulation rate during this period. 
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token-passing system can be nonuniquely determinable from the load, 
and can therefore depend upon history.] We note that the mean delay 
is never less than the mean delay for the corresponding CSMA/CD 
case shown in Fig. 2. However, the standard deviation for a token ring 
under sufficient load is much smaller than for the CSMA/CD network: 
all packets experience very similar delays, as shown in Fig. 11. 

Adding 5-percent data loading to a token bus increases the delays 
to those shown in Fig. 12. Again, the mean is never less than the mean 
for the CSMA/CD case shown in Fig. 6, but the standard deviation is 
much smaller under heavy load. 

To allow a more direct comparison, Fig. 13 shows the capacity of a 
CSMA/CD network as a function of the amount of buffering at the 
receiver, with 5.75-ms packets and no data loading, and allowing 1- 
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Fig. 12—Token bus delay for 5.75-ms voice packets with 5-percent data loading. This 
graph shows the mean and the standard deviation of the delay experienced in the 
transmission of voice packets on a simulated token bus with 5-percent data loading; it 
should be compared with Fig. 6, which shows the corresponding delay for a CSMA/CD 
network. Note that the mean delay is never less than for the CSMA/CD case; the 
standard deviations for a token bus are significantly less than for CSMA/CD under 
heavy load, although they are greater under light load. 


PACKET SPEECH TRANSMISSION 51 


100 
80 
60 


40k 


EFFECTIVE CONVERSATIONS 


20 P 





of 
0 0.20406 1 2 4 6 10 20 40 60 100 200 
DELAY IN MILLISECONDS 


Fig. 13—CSMA/CD capacity as a function of receiver buffering delay, with 5.75-ms 
packets, 1-percent sample loss, and no data loading. This graph shows the capacity, 
measured in effective conversations, of a simulated CSMA/CD network as a function 
of the buffering delay at the receiving station, with (nominally) 5.75-ms packets and 
allowing up to 1-percent of the speech samples to be lost (at the receiver), in the absence 
of data. We note that the capacity depends very little on the buffering at the receiver; 
this suggests that, of some total allowable delay through the system, more delay should 
be allocated to packet length than to receiver buffering. 


percent speech sample loss. No compensation at the transmitter for 
the buffering at the receiver, in the form of locally discarding samples 
that would otherwise simply be discarded remotely, was performed in 
these simulations. Figure 14 shows CSMA/CD capacity with 5-percent 
data loading. By contrast, Figs. 15 and 16 show the corresponding 
relations for the token bus with no data loading and with 5-percent 
data loading, respectively. Figures 13 through 16 show the token bus 
to offer significantly more capacity than the CSMA/CD network, 
suggesting that a token-passing network is superior to an CSMA/CD 
network for voice transmission. We can see that the CSMA/CD LAN’s 
performance is much less dependent on the receiver delay than is that 
of the token ring, suggesting that the increase in the overall delay 
caused by increasing the receiver buffering would be better spent in 
increasing the packet length while keeping the receiver delay relatively 
small. On the other hand, a token bus can efficiently keep a relatively 
small nominal packet size and benefit directly from an increase in 
receiver buffering, as shown. 


IX. CONCLUSIONS 


It is possible to transmit a large number of voice conversations on 
either a CSMA/CD LAN or a token-passing LAN in the presence of 
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Fig. 14—CSMA/CD capacity as a function of receiver buffering delay, with 5.75-ms 
packets, 1-percent sample loss, and 5-percent data loading. This graph shows the 
capacity, measured in effective conversations, of a simulated CSMA/CD network as a 
function of the buffering delay at the receiving station, with (nominally) 5.75-ms packets 
and allowing up to 1-percent of the speech samples to be lost (at the receiver), with 5- 
percent data loading. As in Fig. 13, which considered the corresponding case with no 
data loading, the capacity depends fairly little on the amount of buffering at the receiver, 
again suggesting that receiver buffering should be kept fairly small and its share of the 
overall delay used in allowing the packet size to grow. 


CONVERSATIONS 





0 0.2 0406 1 2 4 6 10 20 40 60 100 200 
DELAY IN MILLISECONDS 


Fig. 15—Token bus capacity as a function of receiver buffering delay, with 5.75-ms 
packets, 1-percent sample loss, and no data loading. This graph shows the capacity, 
measured in actual conversations, of a simulated token bus as a function of the buffering 
delay at the receiving station, with (nominally) 5.75-ms packets and allowing up to 1- 
percent of the speech samples to be lost (at the receiver), in the absence of data. The 
significant increase in capacity with increased receiver buffering, plus some reasoning 
on the nature of token-passing, suggest that the total delay through a token bus system 
should be allocated predominantly to receiver buffering, with relatively small nominal 
packet sizes. 
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Fig. 16—Token bus capacity as a function of receiver buffering delay, with 5.75-ms 
packets, 1-percent sample loss, and 5-percent data loading. This graph shows the 
capacity, measured in actual conversations, of a simulated token bus as a function of 
the buffering delay at the receiving station, with (nominally) 5.75-ms packets and 
allowing up to 1-percent of the speech samples to be lost (at the receiver), with 5- 
percent data loading. As in Fig. 15, which considered the corresponding case with no 
data loading, the capacity depends significantly on the amount of buffering at the 
receiver, again suggesting that receiver buffering in a token bus system should be kept 
fairly large and the nominal packet size fairly small. 


reasonable data loading. The performance of token-passing seems 
superior to that of CSMA/CD. 

There are even better mechanisms for transmitting digital voice: 
time-division multiplexing schemes, for example, can do an excellent 
job for voice, but are not exceptional for carrying data because of their 
inherently synchronous nature. Similarly, CSMA/CD LANs can be 
superior to token-passing for many data applications. It is still a 
research problem to find an LAN that can carry both voice and data 
“optimally”, or to identify more exactly the appropriate trade-offs. 

One significant unanswered question is the potential of such a 
system for serving large numbers of users; there is an inherent limit 
of the number of users on one LAN. It is uncertain to what extent 
internetworking can help, since internetworking would increase the 
mean and standard deviation of the delays. 
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Trunk Implementation Plan for Hierarchical 
Networks 
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The Trunk Implementation Plan (TIP) is a multiyear schedule of planned 
trunk augments and disconnects that minimizes the impact of varying demand 
and forecast uncertainties on the cost of implementing a network meeting 
objective service criteria. This paper presents a theoretical development of the 
TIP algorithm for hierarchical networks that accounts for modularity, facility, 
and demand servicing constraints. First, we solve the only-route TIP problem 
analytically using stochastic dynamic programming techniques. We show that 
our analytical solution yields a numerically efficient algorithm for calculating 
a multiyear, minimum-cost policy. Second, we present the results of our 
analysis showing that, in the presence of forecast uncertainty, a near-optimal 
traffic network is obtained by introducing reserve capacity on the final trunk 
groups only. Based on this result, we construct the TIP algorithm for hierar- 
chical networks by combining conventional network engineering principles 
with an optimal disconnect policy for high-usage trunk groups and the only- 
route TIP sizing procedure for final groups. Using this algorithm we obtain 
an economical multiyear schedule of trunk augments and disconnects for 
hierarchical networks. 


I. INTRODUCTION 
1.1 Background and motivation 


In the Bell operating companies and AT&T Communications, the 
trunk forecasting process consists of: (1) traffic measurement and 
offered load estimation, (2) projection of future traffic demands, and 
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Fig. 1—Trunk network forecasting process. 


(3) determination of the trunk group sizes for each of five or more 
future years. Currently, the engineering procedure utilized in (3) is 
based on independent, single-year network designs, each of which 
minimizes the cost of satisfying anticipated demands for a given future 
year. 

Although substantial work has been done to improve the quality of 
the trunk forecasting process,’ ® the existing methods do not account 
for several important implementation considerations, namely, the 
existing trunk network, the variation of trunk demand from year to 
year, uncertainty of demand forecasts, economics of maintaining or 
rearranging trunks, and facility constraints. Consequently, in practice, 
the output of (3) is modified by trunk forecasters to make the final 
multiyear schedule of planned trunk augments and disconnects feasi- 
ble and economically sensible. The adjustments to the mechanized 
forecasting process are based on heuristic trunk disconnect guidelines 
and engineering judgment. However, no quantitative attempt is made 
to find an optimal multiyear trunk provisioning plan. 

Accordingly, we identified the need for a mechanized system that 
would compute an economical capacity expansion plan for the trunk 
network. As Fig. 1 illustrates, the capacity expansion planning can be 
regarded as the fourth major function of the trunk forecasting process. 
The new mechanized system, called the Trunk Implementation Plan 
(TIP), will provide a multiyear schedule of trunk augments and 
disconnects that accounts for facility constraints while minimizing the 
impact of forecast uncertainty and demand dynamics on the cost of 
implementing a network meeting objecting service criteria. 

In this work, we present a theoretical development of the TIP 
algorithm and generalize the mathematical model of Ref. 4 to reflect 
modularity and facility constraints. The problem formulation in Ref. 
4 assumes a nonmodular engineering environment and no facility 
constraints; i.e., the multiyear schedule of trunk augments and discon- 
nects is cost-effective under the assumption that the sizes of the trunk 
groups are nonnegative real numbers and that the cost of a trunk 
group is proportional to the number of trunks in the group. 

However, the assumptions of nonmodular engineering do not hold, 
for example, for a new generation of digital terminals such as the 
Digital Carrier Trunk (DCT), used for the 1A ESS,* and the Digital 
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Interface Frame (DIF), used for 4ESS. The DCT and DIF require that 
digital carriers [equivalent to 24 voice frequency (VF) circuits] ter- 
minate directly on the switch. The dedication of network facilities by 
destination implies that the cost of a trunk group has a per-module 
component in addition to the per-circuit component. Therefore, in 
Ref. 1, W. Elsner concluded that trunk groups terminating on such 
facilities should be modularly engineered. In particular, Ref. 1 shows 
that an engineering procedure that assumes only modular sizes (24 
trunks for two-way groups) for high-usage and certain final trunk 
groups provides significant economic benefits; see Fig. 2 for definitions 
of trunking terminology. 

In this paper, we replace the continuous TIP model of Ref. 4 by a 
discrete TIP formulation that incorporates modularity constraints. 
Then we derive an optimal modular network expansion policy for 
high-usage and final trunk groups. In addition, we show how to modify 
the TIP multiyear trunk-sizing procedures to reflect facility con- 
straints. 


1.2 Overview 


Section II defines the notation and describes the mathematical 
model for the only-route TIP problem. In Section III we present a 
complete solution to the only-route TIP problem that accounts for 
modularity, facility, and demand servicing constraints. Section IV 
shows how to combine conventional network engineering principles 
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HU - HIGH-USAGE GROUP—DESIGNED TO 
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RECEIVE OVERFLOW TRAFFIC 


IHU - INTERMEDIATE HU GROUP—HU THAT 
RECEIVES OVERFLOW TRAFFIC 


F- FINAL GROUP—LAST-CHOICE GROUP 
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OR - ONLY-ROUTE GROUP—F GROUP THAT 
RECEIVES ONLY FIRST ROUTE TRAFFIC 


Fig. 2—Trunking terminology. 
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with the only-route TIP results to design an economical multiyear 
hierarchical network. 


Il. TIP MODEL FOR ONLY-ROUTE TRUNK GROUPS 


We start our derivation of the TIP algorithm by considering the 
multiyear engineering problem for only-route trunk groups (see Fig. 2 
for a definition of only-route trunk groups). As we discuss in Section 
4.3, our solution of the only-route problem will be utilized to plan 
multiyear capacity expansion for hierarchical networks. 


2.1 Mathematical model 
2.1.1 Notation 


First, we define the notation used in our mathematical model. 

T(k)—number of trunks in service at the beginning of the kth year 

u(k)—number of planned trunk augments/disconnects at the begin- 
ning of the kth year 

d(k)—the maximum number of trunks (peak demand) in trunks 
during year k to guarantee the engineered blocking level 

F,,—the distribution of the peak trunk demand during year k 

ci(u(k))—capital cost during year k 

c&(u(k))—labor cost during year k 

c&(T(k), u(k))—maintenance cost during year k 

ci(d(k), T(k), u(k))—underprovision cost during year k 

N—number of years in the forecast horizon 

We assume that the number of trunks in service, T(k); the planned 
trunk level, T(k) + u(k); and the peak demand, d(k); are expressed in 
modules of m trunks. In the message network, m is equal to 1, 12, or 
24. Consequently, F, is a discrete distribution function, defined in 
accordance with the established rounding rules for engineering mod- 
ular final groups.’ That is, if FZ is a continuous distribution of the 
peak demand, then F; is obtained by 


Fi(m/) = Fi(mZ+ J, 


where “is a nonnegative integer, and vis a rounding threshold, 
0<w<m. 


2.1.2 Trunk group dynamics 


According to the AT&T practice, if the blocking objective on an 
only-route or alternate final trunk group is violated significantly, then 
the trunk group is augmented during the year on an emergency basis 
(demand servicing) to restore the engineered blocking level. Therefore, 
the number of trunks in service at the beginning of the (k + 1)th year 
is the sum of the planned trunk level for year k and the demand 
servicing augmentation, if any, during year k. Thus, the trunk group 
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dynamics that reflect the planned and demand servicing components 
of the trunk provisioning process are modeled by 


T(k + 1) = [T(Rk) + u(k)] + max[0, d(k) — (T(k) + u(k))] 
= max[y(k), d()], (1) 


where y(k) = T(k) + u(k) represents the planned trunk level at year 
k. 


2.1.3 Objective function 


The goal of the only-route TIP is to minimize the expected present 
worth of trunk provisioning costs. If we denote the present worth of 
the total cost for year k by g,(d(k), T(k), u(k)), then the TIP objective 
function can be expressed as 


N-1 
min J(u) = min £ | d gild(i), TC), u(Dlp (2) 
u u i=0 


where u = (u(0), ---, uw(N — 1)) and the expected value is taken over 
the demands d(0), d(1), ---, d(N — 1). 
The present worth of trunk provisioning costs at year k is equal to 


gx(d(k), T(k), u(k)) = p*[ct(u(k)) + ¢(u(k)) 
+ c3(T(k), u(k)) + ck(d(k), T(k), u(k))], (3) 


where p is the discount factor (p < 1) that measures the worth of the 
next year’s dollars in terms of present dollars. 

The capital, labor, maintenance, and underprovision costs are as- 
sumed to be piecewise linear with respect to modules of m trunks and 
are defined for k = 0, ---, N-—1 by 


_ Jafu(k) u(k) = 0 
exe) Me u(k) <0, ) 
_ Jagu(k) u(k) = 0 
ex(ulh)) = run u(k) <0 ” 
c3(T(k), u(k)) = a3y(k), (6) 
ci(d(k), T(k), u(k)) = afmax(0, d(k) — y(k)), (7) 


where y(k) = T(k) + u(k) is a planned trunk level at year k. 

Equation (7) states that if the peak demand d(k) exceeds the planned 
trunk level y(k) during year k, then the cost of providing the additional 
trunks is proportional to the trunk shortage; if d(k) does not exceed 
the planned level, then no cost is incurred. Thus, the underprovision 
cost reflects the demand servicing policy as described by the trunk- 
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group dynamics equation (1). The assumption that the underprovision 
cost is linear with respect to the trunk shortage will be discussed in 
Section 2.1.4. 


We assume that in (4) through (7) the per-trunk costs a¥, ---, aj 
and 6%, bf are nonnegative and satisfy the conditions 

ak + ak > bi — bk, (8a) 

bi — bg > 0, (8b) 

at + a} > pat? + a"), (9a) 

bi — b§ > p(bi*? — b§*), (9b) 

ak > ak + ak + af. (10) 


Inequalities (8a) and (8b) state: first, that the cost of buying and 
installing a trunk module is always greater than its salvage value 
minus the disconnect expense; and second, that there is always an 
incentive for disconnecting a trunk module. Inequalities (9a) and (9b) 
show that it is uneconomical to augment a trunk group if not necessary 
and also uneconomical to delay the disconnect decision. Finally, (10) 
reflects the fact that it is always more costly to augment a trunk group 
on an emergency, rather than on a planned, basis. 


2.1.4 Demand servicing constraints 


In general, the underprovision cost or, equivalently, the unsatisfied 
demand penalty cost, involves all of the costs of planned servicing: 
capital, labor, and maintenance, plus a penalty due to the fact that 
demand servicing cannot be carried out with the normal planning 
intervals and orderly procedures associated with planned servicing. It 
is important to note that; in practice, the incremental underprovision 
cost depends on a variety of factors such as switch, trunk, and/or 
personnel availability. Typically, if the existing personnel and facilities 
can satisfy the emergency servicing need, then the cost of underpro- 
visioning is marginally higher than the cost of planned trunk augmen- 
tation and can be computed by (7). However, if there is a shortage of 
personnel or facilities, demand servicing becomes much more expen- 
sive (than planned servicing) and highly undesirable. 

Thus, to complete the problem formulation we need to specify that 
the feasible solutions in the TIP model correspond to a level of demand 
servicing not greater than an allowable threshold. In particular, we 
require that the expected amount of demand servicing at year k not 
exceed a specified level 


E{max(0, d(k) — T(Rk) — u(k))} s 6,Ed(k), 


where k = 0, ---, N — 1 and £; is a given constant. 
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2.1.5 Mathematical model 


The TIP problem can be viewed as a sequential stochastic decision 
process. The state of the system, the number of trunks in service, 
varies according to the trunk group dynamics given in eq. (1). At each 
state of the process the cost function g,(d(k), T(k), u(R)) is defined via 
(4) through (7). The problem then is: given the initial trunk level T(0) 
and future peak demand distributions Fo, F,, ---, Fy-1, find a set of 
decisions (augments, disconnects), u* = {u*(0), ---, u*(N — 1)}, the 
optimal policy, that minimizes the total expected trunk provisioning 
cost over N years 


N-1 
J(u*) = min E | d ga(d(k), T(k), vce} (11) 
u k=0 
subject to possible capacity limitation conditions 
0 < T(k) + u(k) S (RP) (12) 
and demand servicing constraints 
E{max(0, d(k) — T(k) — u(k))} S B,Ed(R), (13) 


where k= 0,1, ---, N—1 and y(k) are given modular thresholds that 
represent facility constraints. 


Il. SOLUTION FOR ONLY-ROUTE TRUNK GROUPS 


Throughout the rest of this paper, we assume that the random 
variables d(k) are statistically independent. We shall prove that under 
this assumption the optimal control law to the only-route TIP problem 
of Section 2.1.5 is defined by N pairs of scalars (S(0), S(0)), vee, 
(S(N — 1), SCN — 1)). The pair (S(k), S(k)) provides two critical 
levels for year k. Specifically, at year k, the optimal control law is 
to augment the number of trunks up to the level S(z), to maintain 
the trunk level if S(k) = T(k) < S(k), or to disconnect down to the 
level S(k); that is, for k = 0, 1, ---, N — 1 the optimal decision 
ug = uZ(T(R)) is given by 


[S(e)-— TR) if Th) < Sk) | 
uz(T(k)) = 40 if S$(k) < T(k)<S(k) (14) 
lS@ — T(k) if T(k) = S(R). 


We note that the independence assumption for future demands is 
critical for obtaining an analytical solution to the only-route TIP 
problem. However, our numerical experience shows that cost-effec- 
tiveness of the TIP solution as compared to currently used heuristic 
trunk provisioning strategies is not sensitive to this assumption. 

We start our derivation by considering the TIP formulation that 
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ignores demand servicing constraints. Specifically, Sections 3.1 and 
3.2 present a complete solution to the modular only-route TIP problem 
described by (11) and (12). A similar nonmodular capacity manage- 
ment problem is considered in Ref. 5. In Section 3.3, we generalize 
this solution and obtain a numerically efficient algorithm that com- 
putes an optimal policy under the constraint on the level of demand 
servicing. 


3.1 Optimality conditions 
Since the demands are assumed to be independent, the TIP objective 
function (2) can be transformed to the form® 
min J(u) = min E{g,(d(0), T(0), u(0)) + min E}... 
u u(0) d(0) u 


1) d(1) 


+ min E {gy-.(d(N — 1), TIN — 1), uN — 1))} ---H}. 
u(N-1) d(N-1) 

To convert the right-hand side into a recursive form, we introduce the 
optimal cost-to-go function for the year k, V;(T(k)), that is, the 
minimum expected cost over all possible strategies for years k, 
k+1,---,N-—1 that assume 7(k) trunks in service at the beginning 
of year k. Then, by Bellman’s principle of optimality,® the optimal 
policy in year k is obtained by solving the backward dynamic program- 
ming recursion: 


V.(T(Rk)) = min E {g,(d(k), T(k), u(k)) 
u(k) d(h) 


+ pVr+i(max(d(k), T(k) + u(k)))}, (15) 


where Vj(7T(N)) = 0 and the minimum is taken over all modular u(k) 
such that the planned trunk level at year k satisfies the condition 


0s T(k) + ul(k) S y(R). (16) 


The sum of trunk provisioning costs for year k is composed of two 
functions g,1(-) and g,2(-) defined by 


&a(d(k), T(R), u(R)) 


ae T(k), u(k)), u(k) = 0 
gro(d(k), T(k), u(k)),  u(k) < 0 


aju(k) + agu(k) + a3(T(k) + u(k)) 

+ aimax(d(k), T(k) + u(k)), u(k) = 0 
biu(k) — bgu(k) + a3(T(k) + u(k)) 

+ aimax(d(k), T(k) + u(k)), u(k) < 0. 


(17) 


In what follows, we consider g;(-) as a function of d(k), T(k), and 
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y(k) = T(k) + u(k). Then, from the optimality principle and (17) the 
year k cost-to-go function is given by 


Mons yk),  y(k) = TI) 
Jno(T(k), VR), 0S y(k) < T(R) 


E [g:a(d(k), T(R), (R)) 
d(k) 


+ pVr+i(max(d(k), y(k)))}, 
y(k) = T(k) (18) 


Ji(T(R), W(R)) 


ll 


E (g:,2(d(k), T(R), y(R)) 
d(h) 


+ pVe+i(max(d(k), y(k)))I1, 
0 = y(k) < T(R). 


Note that from (17) Jzi(T, y) and Jzo(T, y) are well-defined functions 
for all y= 0, m, 2m, ---. 

To prove that a solution of (18) is given by (14) we first develop 
sufficient conditions for the optimality of an (S, S) policy without 
capacity constraints (y = ©). We observe that J,:(7, T) = Jz2(T, T) 
for all T= 0. Accordingly, to prove the optimality of (14) it is sufficient 
to show that the solutions of the minimization problems 


min J,(T, y) (19) 
y2T2=0 
and 
min J,(T, y) (20) 
T2=y2=0 
are defined by 
_jJS, T<s 
and 
_|8, T>S 
anh T <§, es 


respectively, for some numbers S and S such that 
0<S<S<o, 


To simplify the notation, we have dropped the index k in (19) through 
(22) and in the formulas in the rest of this section. 

Consequently, the optimality of (21) and (22) is evident if there 
exist a pair of numbers (S, S), Ss S such that for any T = 0 
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J\(T, x) = J\(T, y), 0 =.= ¥y < S 93 

I(T,x)<A(T,y), S§$sxsyso aw 
and 

J,(T, x) = JT, y), S =Sxs ¥y < se (24) 

J(T, x) = J2(T, y), 0O<x<ysS. 


To demonstrate (23) and (24) and to construct S and S we shall 
prove an even stronger statement. Specifically, we consider the first 
differences of J, and Jo, defined by 


Ad,(T, y) = J\(T, y + m) _ Ji(T, y) 
and 
AdJ(T, y) == J,(T, y a m) ~ JT, y), 


and show that Ad; and Adz satisfy the conditions: 
1. AJd,(T, y) = Adi(y) and Ad2(T, y) = Ade(y), 
2. AJi(y) and AdJ2(y) are nondecreasing, and 
3.0<5S<S<o, 

where 


S = min{y| Adi(y) = 0,y = 0, mM, vee] 
and 
S = min{y]Ado(y) = 0, y = 0, m, -+-}. (25) 


Note that (25) defines the minimum points of J; and J, respectively. 
Furthermore, conditions (1) through (3) not only imply the optimality 
of (14) but also suggest that we can find the critical thresholds 
efficiently by applying a modular version of the bisection methods to 
the first differences of J, and Jz. The proof that J,; and J,» [defined 
by (18)] satisfy these conditions is given in Appendix A. 

The generalization to the constrained case follows immediately. 
Because of the monotonicity of AJ;(y) and Ad2(y) the optimal solution 
under capacity constraints is given by 


S* = min[9, y] 
and 

S* = min[S, 4]. (26) 
3.2 Computational procedure 


As we prove in Appendix A, an optimal modular TIP policy is 
described by N pairs of critical threshold levels {(S(0), S(0)), ---, 


66 TECHNICAL JOURNAL, JANUARY 1984 


(S(N — 1), S(N — 1))} and each pair (S(k), S(k)) can be computed 
by (25). 

Consequently, to obtain an algorithm for calculating the optimal 
policy we shall derive backward recursions for AJ;,;(T(k) + u(k)) and 
Adz2(T(k) + u(k)). In Appendix A we show that 
Adxa(y(k)) = [at + a3 + af — af(1 — Fe(y(k)))]m 


+ pak Ve+i(max(d(k), y(k))) (27) 


and 
Ady.o(y(k)) = [b? — 63 + af — ak(1 — F,(y(k)))]m 
zy pA E Ve+s(max(a(h), y(R))), (28) 


where y(k) = T(k) + u(k). 

Thus, we can confine our effort to the derivation of the backward 
recursion for the first difference of the expected optimal cost-to-go 
function defined by 


ae Ve+i[max(d(k), y(R))] 
= EVerslmax(d(h), y(k) + m)} — EVislmax(a(h), y(R))]. (29) 


The details of our derivation are presented in Appendix B; we dem- 
onstrate 


AE Vi4;{max(d(k), y(R)] 
d(k) 


—(af*! + aS*!)m if y(k) < S(k +1) 
[a3*? — af**(1 — Frsi(y(k))|m 
= F,(y)- + pA E Verolmax[d(k + 1), y(k)]}} (30) 
if S(k +1) <y(k) < S(k+1) 
—(bs*1 — bk1)m if y(k) = S(k +1). 


Consequently, the backward recursions (27), (28), (30), and formulas 
(25) define an algorithm that yields the complete set of optimal 
threshold levels {S(N — 1, S(N — 1), --- (S(0), S(0)} for the problem 
described by (15) and (16). 


3.3 Final solution of the only-route TIP problem 


Now we apply the Lagrangean relaxation approach to show that the 
computational procedure of Section 3.2 can be used to find an optimal 
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solution to the original TIP formulation (11) through (13) that includes 
demand servicing constraints. Indeed, let us consider the functional 


N=1 
H(u, ») = J(u) + Y Axp*E max(0, d(k) — T(k) — u(k)), (31) 
k=0 


where A = (Ao, - +>, AN-1)- 

Clearly, for any given \ = 0, minimizing H(u, \) with respect to u 
subject to (16) is equivalent to solving the problem described by (11) 
and (12) with the incremental underprovision cost a? replaced by 


ak = ak + Ake 


To demonstrate that a solution to the problem (12) and (31) (with 
appropriately fixed X) is, in fact, a solution to the original TIP problem, 
(11) through (13), we need to prove the following general proposition: 

Let u* be a minimum of the functional 


N-1 


J(u) + Dd AxE,(u) 
k=0 


for all u € U, where J(u), E,(u) are arbitrary real-valued functions, U 
is some set of admissible controls, and \ = 0. Then, u® is a solution 
to the problem: 


min J(u) 
ucU 


subject to 
E,(u) s E,(u*) (32) 


for all k such that d; > 0. 
The proof of this proposition is simple. By our hypothesis, for all 
u € U we have 
N-1 N-1 


Jur) + Y MwExu*) = Su) + Y vEatw, 


or 


N-1 


J(u*) = J(u) + 2 ArLEx(u) — £,(u*)]. (33) 


Since A is a nonnegative vector the second term of the right-hand 
side of (33) is nonpositive for any u € U such that conditions (32) are 
satisfied. Therefore, 


J(u*) <= J(u) 


for all admissible controls u such that £,(u) <= E;,(u*) when 2d; > 0. 
Q.E.D. 
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This result implies that the optimal solution of the original TIP 
problem is also of the (S, S) type. Moreover, the proposition shows 
that if we can find an optimal solution to the TIP problem (11) and 
(12) with some incremental underprovision costs af = af + Ax, A, > 0, 
and if this solution results in expected demand servicing equal to 
1008, percent, then it is an optimal solution to the problem described 
by (11) through (18). 

To utilize this result we need to derive formulas for computing 


the expected demand servicing level for a given policy 7 = {($(0), 
S(0)), ---, (SCN — 1), SCN — 1))}. The derivation is given in Ap- 
pendix C. 


Now we can describe an algorithm to solve the TIP problem de- 
scribed by (11) through (13). A numerical procedure for obtaining the 
TIP solution can be outlined as follows: 

Step 1—Set the initial vector of the Lagrange multipliers: 


h=0 (34) 


and identify the set K of years k for which the demand servicing 
constraint level would be exceeded unless a positive value were set for 
Nr 

Step 2—Using the computational procedure described in Section 
3.2, determine 


m = {(S(0), S(0)), ---, (SIV — 1), SIN - 1))} 


that minimizes 


H(u, d). (35) 
Step 3—Using the Step 2 solution 7, for all years k, k € K, calculate 
ne(A) = {E max(0, d(k) — T(R))} — 8,Ed(R) (36) 


and determine whether | | < «x, where «, > 0 is some tolerance level. 
If for every k the tolerance level ¢; is satisfied, stop; otherwise go to 
Step 4. 

Step 4—To reduce ||, go back to Step 2 with \ replaced by 


A + wn(d), (37) 


where w is an appropriate positive constant. (The constant w can be 
adjusted by trial and error to speed up the iteration procedure.) 

In general, the Lagrange multipliers vary from year to year depend- 
ing on the demand characteristics, constraint levels, and cost coeffi- 
cients. Our numerical experience revealed, however, that under rea- 
sonable assumptions for TIP application [growing (declining) demand 
pattern, 6, = 8, and a? = q;] the vector of Lagrange multipliers \* can 
be approximated by ’ = (X’, ---, A’), where the constant \’ depends 
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only on a given level of demand servicing 8 and the coefficient of 
variation of the demand. 

Using the approach outlined in Appendix C, we verified that the 
approximation of \* by X’ does not result in a significant cost penalty, 
i.e., 


min H(u, A*) = min A(u, dX’). 


Consequently, to obtain a numerically efficient TIP algorithm and to 
facilitate TIP implementation, we developed a conversion table (as 
illustrated on Fig. 3) that defines \’ as a function of the demand 
servicing constraint level £. 


IV. NETWORK TIP 


This section extends the only-route TIP to determine a multiyear 
schedule of trunk augments and disconnects for a hierarchical network 
that minimizes the present worth of the expected cost of planned and 
demand servicing subject to capacity and demand servicing con- 
straints. 

We note that there is a fundamental difference between the only- 
route and the hierarchical network TIP problems. The only-route TIP 
problem answers two basic questions: first, how much extra capacity 
is needed on a trunk group to hedge against forecast uncertainty and/ 
or to satisfy the constraint on the amount of demand servicing; second, 


3.0 


2.5 


B’ IS AN ALLOWABLE PERCENTAGE OF TRUNKS ADDED IN 
DEMAND SERVICING NORMALIZED BY THE COEFFICIENT OF 
VARIATION OF THE DEMAND, f' = 8 /cvar. 


2.0 


N /(a,tagtag) 
in 


0.5 





0 5 10 15 20 25 30 35 40 45 
gp 
Fig. 3—Approximation of Lagrange multipliers. 
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how should the year-by-year trunk requirements be smoothed to obtain 
the optimal balance between the costs of maintaining and rearranging 
trunks over a planning horizon. In the network case the major addi- 
tional question is to determine where to provide extra capacity in the 
network, that is, should this additional capacity be provided on all of 
the trunk groups in the network or on specific trunk groups only? 


4.1 Overview of network TIP solution 


To develop a network TIP algorithm, we exploit the heuristic 
principles used in conventional trunk group sizing procedures. In 
particular, we decouple the network TIP problem into individual 
cluster TIP problems, where a cluster is defined by a final trunk group 
and all subtending high-usage groups that overflow to that final group 
(Fig. 2 illustrates trunking terminology). To simplify the analysis we 
shall assume that the demand servicing policy is to augment only the 
final group when the blocking objective is violated. Our numerical 
experience shows that if an unbiased traffic load forecasting algorithm 
is used (such as in Ref. 3), then our demand servicing policy assump- 
tion is not critical to the optimality of the final TIP solution. Conse- 
quently, in this section we present a cluster-sizing procedure that 
minimizes the expected present worth of planned servicing expendi- 
tures on each high-usage (HU) trunk group plus the planned and 
demand servicing expenditures on the corresponding final trunk group 
subject to facility and demand servicing constraints. 

The key idea of our solution is based on a heuristic argument that 
suggests that a near optimal solution can be obtained by accounting 
for forecast uncertainty and demand servicing constraints on the final 
trunk group only. We draw this conclusion from our analysis in Section 
4.2, where we consider a single-year TIP problem. Extending this to a 
multiyear case we then assume that Truitt’s engineering procedure, 
the ECCS rule,’ can be used to find single-year, initial HU trunk 
requirements that then will be adjusted to eliminate uneconomical 
trunk group rearrangements. 

Accordingly, in Section 4.3 we derive an optimal disconnect policy 
for HU trunk groups, that is, we show how to satisfy the ECCS trunk 
requirements for primary HU trunk groups while minimizing the 
present worth of the planned servicing cost over a planning horizon. 
As Fig. 4 shows, when the initial five-year TIP solution on primary 
HU groups is obtained, the TIP algorithm proceeds by calculating 
overflow traffic and sizing the intermediate HU trunk groups using, 
again, the Truitt’s engineering procedure and an optimal disconnect 
policy. After all subtending HU groups have been sized, the final trunk 
group becomes the last and only choice for the remaining traffic to 
reach its destination. Consequently, the capacity expansion planning 
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Fig. 4—Network TIP algorithm. 


problem for the final trunk group reduces to an only-route TIP 
problem (11) through (13). 

In Section 4.4 we show that under certain circumstances the initial 
cluster TIP solution may provide less (more) trunk capacity on the 
final trunk group than that necessary to satisfy the blocking objective 
and the demand servicing constraint. In that case we show how to 
improve the initial TIP solution by increasing (reducing) the sizes of 
the subtending HU groups in an economical fashion. 


4.2 Alternate routing under uncertainty 


To analyze the impact of forecast error on the optimal design of 
hierarchical networks we first formulate a single-year TIP problem 
for a Truitt alternate routing triangle.” Referring to Fig. 5, we assume 
that load (7+ «) is offered to the direct (HU) group, and background 
loads (4 + «) and (4 + e) are offered to the first and second legs of 
the alternate route, respectively, where «, &, €2 are the errors in the 
load forecast. The trunk group sizes on the direct and alternate routes 
are denoted by T, Ti, and T:, respectively. Then the problem of 
determining T, T;, T2 to minimize the expected cost of trunk provi- 
sioning activities during the year is given by 
min E[CpT + CaiT, + CyoT2 


T,T1,T2 
+ Csimax(0, d; = T;) + Csomax(0, d» = T2)], (38) 


where d; and d, are the number of trunks required on the alternate 
route to satisfy the network service objective; Cp is the incremental 
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Fig. 5—Alternate routing under uncertainty. 





cost of adding a trunk to the direct route on a planned basis; C4i, Cas 
and Cs;, Csz are the incremental costs of planned and demand servicing 
on the alternate route legs. Also, in (38) we assume that when the 
calculated number of trunks required exceeds the number of trunks in 
service the demand servicing augmentation is performed on the final 
group only. 

Our goal is to minimize (38) under the demand servicing constraints 
given by 


E max(0, d, _ T;) = BEd, 
and 
E max(0, dy =< T2) = BEds, (39) 


where the expected value is taken with respect to ¢, «, and ¢, 4, &, 
respectively. 

In our numerical studies we investigated the cases when 10 to 100 
erlangs of traffic are offered to the direct group and 10 to 500 erlangs 
are offered to the alternate route. We also assumed that the coefficient 
of variation of the demand forecast on the HU group, CVp, and each 
leg of the alternate route, CV4; and CV,42, vary between 0.0 to 0.25, 
and that the demand servicing threshold, 8, is in the range from 5 to 
30 percent. Finally, we assumed that the forecast errors ¢, €;, and € 
are statistically independent. 


TRUNK FORECASTING = 73 


4.2.1 Major conclusions 


We compared the optimal single-year TIP solution with the solution 
that sizes the direct route by Truitt’s formula and then sizes the final 
to satisfy the blocking objective and demand servicing constraint at 
minimum cost. Our sensitivity analysis revealed that the optimal trunk 
requirement on the HU trunk group does not change significantly with 
changes in the coefficient of variation of the forecast, i.e., the optimal 
solution accounts for uncertainty by providing extra capacity, mainly 
on the final trunk group. More importantly, the cost difference be- 
tween the optimal and the ECCS-based solutions is less than 1 percent. 
Thus, we conclude that the expected trunk provisioning cost function 
is very flat in the neighborhood of a solution point and the Truitt’s 
HU solution is relatively close to the optimal HU trunk size. 

To exploit the single-year ECCS design as a basis for a five-year 
trunk plan on HU groups, we next address the question of how to 
adjust the single-year trunk requirements to obtain an economical 
trunk implementation schedule for a given planning horizon. 


4.3 Optimal disconnect policy for high-usage groups 


We shall use the notions and notation of Section 2.1, except that 
d(k) now represents the deterministic (rather than random) ECCS 
trunk requirement at year R. 

As explained in Section 3.3, we omit the demand servicing constraint 
while sizing HU trunk groups. Consequently, for HU trunk groups the 
objective is to fulfill the ECCS trunk requirements at minimum cost, 
i.e., to minimize 


N-1 
» p'[c#(u(k)) + e8(u(k)) + c5(T(R), u(R))] (40) 


subject to the ECCS constraints 
T(k) + u(k) = d(k), 
where 7(0) is given and T(k + 1) is defined by 
T(k + 1) = T(k) + u(k). (41) 
Under conditions (8) and (9) we will show that the optimal decision, 
u*(k), has the following form: 
d(k) — T(k), if d(k) = T(k) 
u*(k) = min { max d(k+ i) — T(k), 0} , if d(k)<T(k), (42) 
i=0,---j* 


where j* is the largest integer 7 such that 


fl ae . 
bk — bE + YY piakti < p*(ak™ + af”). (43) 
i=0 
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Note that in (48) the cost coefficients have superscripts, while only p 
is raised to a power. 

Condition (43) states that the present worth of money recovered by 
disconnecting a trunk module in year k and maintaining T(k + 1) — m 
trunks is less than the present worth of a trunk module purchase and 
installation in year k + j* but is greater than this cost in year 


k+j* +1. 
The second half of the policy (42) dictates that if T(k) or more 
trunks are required in some year k, k + 1, ---, k + j*, no trunks are 


disconnected. Otherwise, trunks are disconnected to the lowest possi- 
ble level not requiring any reconnections prior to year k + j* + 1. 

To demonstrate that u*(k) = d(k) — T(k) if d(k) = T(k), we need to 
show only that if u(k) > d(k) — T(k), then the corresponding control 
strategy u' and (u(0), ---, u(N — 1)) can be improved by the strategy 
w = (u(0), ---, uk) — m, uk +1)+™m, ---,u(N — 1)), where wu’ is 
necessarily feasible. To show this we consider the two possible sce- 
narios: 


u(lk+1)>0 and u(k+1) <0. 
For u(k + 1) > 0, the cost difference, L(u') — L(u’), of the two 
strategies u!, u? is 
L(u’) — L(u?) = p*m(ai + a3 + a3) — p**m(at + aft). 
Then from (27) and the positivity of a§, L(u!) > L(u?). Similarly, when 
u(k + 1) < 0 we obtain 
L(u') — L(u?) = p*m(at + af +a$) — p**tm(bi*! — b5*1), (44) 


or, from (8) and (9), L(u’) > L(u?). 

To prove the second part of statement (42) we consider two cases. 
First, let us show that if we have a control strategy u’ = (u(0), ---, 
u(N — 1)) that disconnects fewer trunk modules in year k than 
suggested by (42), ie., 


imax {d(k + i)} — T(k) < u(h) <0, (45) 


then u’ can be improved. 

First assume that u(k + 3) < 0 for some j such that 1 <j s j*. Let 
k + j be the first such year. Then a better feasible policy is given by 
u’ = (u(0), --- , u(k) — m,---,u(k+j), ---). Indeed, for the difference 
in planned servicing cost we get 


j-1 
L(u') = L(u?) = p*m (i —b3+ > rat") 
i=0 


— p*m(bT" — bz"), (46) 
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and from (9), 


L(u’) — L(u?) > 0. 
If there is no year R+j (1 $j SJ*) for which u(k + j) < 0, then from 
(43) a less expensive feasible solution is presented by u? = (u(0), ---, 
u(k) —m, ---,u(kR+j*+1)—™m,---). 

Now, we consider the second case and demonstrate that if we 
disconnect more trunks than specified by (42), i.e., u(k) < u*(k) = 
max{d(k + 1)} — T(k), then the solution u! can be improved by u? = 
(u(0), ---, u(k) + m, ---, uk + J) — m, ---), where j is such that 
d(k +j) = max{d(k + i)}. By the first part of (42) we add only as many 
trunks as needed. Therefore, if u'(k) < u*(k) then we can assume that 
u(k + j) = 0. Consequently, we get 


j-l 
L(u') — L(u?) = —mp* [ot — b§ + Y pas — pat? + os) 
1=0 


and from (43) we conclude 
L(u') — L(u?) > 0. 


The proof is complete. 

Using the intuitively appealing solution described by (42) we can 
construct a simple, numerically efficient scheme that evaluates the 
optimal policy u*: 

Step 1—If d(k) — T(k) is positive, set u*(k) = d(k) — T(k) and go to 
Step 3. 

Step 2—If d(k) — T(k) = 0, find maximum j for which (48) is 
satisfied; that is, find 


j= maxtjli<j<N-1-2 


j-1 
bi — bs + Y pias" < plat + ony} 
1=0 


Then, compute d* = maxozi<;-d(k + 1). If d* = T(k), set u*(k) = 0 and 
go to Step 3; if d* < T(k), set u*(k) = d* — T(k) and go to Step 3. 

Step 3—If k = N — 1, stop. Otherwise, set T(k + 1) = T(k) + u*(k), 
replace k by k + 1, and go to Step 1. 

If d(k) > T(k), the first step of algorithm simply augments the trunk 
group up to the ECCS requirement at year k. The second step deter- 
mines how many trunks to disconnect if the current requirement is 
less than the number of trunks in service. 


4.4 Final solution for HU trunk groups 


The TIP algorithm described in Section 4.1 constructs a near 
optimal schedule of trunk augments and disconnects by smoothing the 
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ECCS high-usage trunk group requirements and by accounting for 
forecast uncertainty on final trunk groups only. As we stated in Section 
3.1, the TIP solution on final groups is defined by N pairs of critical 
thresholds (S*(z), S*(k)), R= 0, ---,N—1. In practice, it is quite 
possible that because of the condition 


T(k) + u(k) Ss y(k), 


the final group cannot be augmented to satisfy the blocking objective 
and the demand servicing constraint at year k. In that case we show 
how to adjust the sizes of subtending HU groups to reduce the load on 
the final. 

Thus, in the development to follow we assume that S(k) found by 
(25) is greater than the constraint level at year k, 


S(k) > y(R), 
and, therefore, from eq. (26) 
S*(k) = S*(k) = y(R). 


We note that the lower optimal threshold S(k) represents the 
minimum number of trunks required to satisfy the blocking objective 
and demand servicing constraint. Consequently, the difference S(k) — 
S*(k) indicates the deficit in final trunks due to the facility con- 
straints. To account for this deficit economically, we formulate the 
problem of augmenting high-usage groups to relieve the final. First, 
let us introduce the notation: 

z(k) is the deficit in final trunks at year k, z2(k) = S(k) — S*(k); 

2;(k) is the portion of the deficit (number of final trunks) covered 

by augmenting subtending high-usage group / at year k; 

dj(k) is the trunk requirement on the subtending group j at year k; 

6;(k) is the additional number of trunks on the subtending group j 

at year k that compensates for one final trunk; 

6j(k) is the maximum reduction in the final trunk requirement 

that can be obtained by augmenting trunk group j at year k; 

M is the total number of subtending trunk groups. 

Also, we shall use the notation introduced in Section II for the cost 
functions, cj/(-), c#(-), c#(-), controls, u;(k), and number of trunks in 
service at the beginning of the year, T;(k), where the index j identifies 
the subtending trunk groups. 

Given the initial trunk levels for high-usage groups, 7,(0), ---, 
Ty(0), we wish to find a multiyear schedule of trunk augments and 
disconnects that minimizes the present worth of the planned servicing 
costs, 
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N-1 M 
min L = min Y ¥ p*{c#(uj(k)) 
Uy,++-,Uy k=0 j=1 


c}(uj(k)) + c#(Tj(k), uj(k))] (47) 


subject to the following conditions: 
1. The number of trunks in service on high-usage group j at year k 
must be greater than the new, inflated trunk requirement, that is, 


Tj(k) 2 dj(k) + 6j(k)z;(R). (48) 


2. The total trunk requirements on the subtending high-usage 
groups at year k [given by (48)] must be sufficient to cover the deficit 
in final trunks, that is, 


M 


2s 2j(k) = 2(R). (49) 

3. Since we assumed that the demand servicing augmentation is 

performed on the final group only, the trunk group dynamics equation 
for the subtending high-usage groups is described by 


T(R + 1) = Tj(k) + u;(R). (50) 
4, The unknown variables z;(k) must satisfy feasibility constraints 
0O< 2;(k) = 6;(R). (51) 


To solve the nonlinear optimization problem (47) through (51) note 
that if all the nonnegative variables z;(k) are fixed, then the minimi- 
zation pro can be decomposed as follows: 


min L= 5 min in L;(2;(0), -++, 2(N — 1)) 


Uy, ++ Uy i : 


N-1 
- Ym 1in ind p'[cP(uj(k)) + c#(uj(k)) + cF(T,(k), uj(k))].- 


Also, when re are fixed, the minimum of the cost functional L,, 
L*(z;(0), - ++, 2;(N — 1)) can be determined by the algorithm presented 
in Section 4.3. Therefore, the trunk capacity allocation problem is 
described by 


min ) L}(z(0), ---, 2(N — 1)), (52) 
2k) j=1 
subject to 

M 

>» 2(k) = 2(R), k=0,---,N-1, 

j=l 


where 2z;(k) satisfy feasibility constraints and L7(-) is computed by 
the algorithm of Section 4.3. 
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We start by considering the case in which there is only one year, 
k =k’, such that z(k) > 0. Then since the unknown trunk quantities, 
zj(k’), are integers we can use a forward dynamic programming recur- 
sion for solving (47), i.e., 


fly) = min[L; (2(R’)) + filyy — 2k’), J=,---,M (53) 


where f.(y) = 0, 0 S 2(k’) S 93 ym = 2(k’), and L7(z(k’)) = 
L;(0, --+, 2;(k’), --+, 0) is calculated by the algorithm of Section 4.3. 

Note that modularity constraints on HU groups can be easily 
incorporated into the discrete dynamic programming formulation (53) 
to reduce the computational burden. Furthermore, the method outlined 
by (53) can be used sequentially for each year for which 2(k) > 0. 
There is no guarantee, however, that this “one-year-at-a-time” pro- 
cedure will terminate at a global optimum. Various refinements of the 
“one-year-at-a-time” method are considered in Ref. 8. In general, these 
refinements increase a chance to reach an optimum but require sig- 
nificantly more computation. 

Finally, we add that the same mathemathical approach can be used 
to decrease the number of trunks on subtending high-usage groups 
economically when there is an extra capacity on the final. Recall that 
S*(k) represents the minimum trunk requirement at year k to satisfy 
the blocking and demand servicing constraints. Consequently, if 
T(k) > S*(k), then the difference between the planned trunk level 
and the S*(k) defines the amount of extra capacity on the final group 
at year k that can be used to reduce the planned servicing cost on 
subtending high-usage groups. 


V. FINAL REMARKS 


We have described a theoretical development of a new capacity 
expansion planning process, TIP, that provides a cost-effective mul- 
tiyear schedule for trunk augments and disconnects for hierarchical 
networks. In contrast to the existing traffic engineering procedures, 
our solution accounts for forecast uncertainty, demand dynamics, 
trunk implementation costs, facility constraints, and demand servicing 
constraints. As we have shown in Sections III and IV, the dynamic 
programming approach to the stochastic capacity expansion problem 
yields a numerically efficient TIP algorithm that is easy to implement. 

New, mechanized forecasting systems based on the TIP algorithm 
have been recommended for implementation in the operating compa- 
nies and AT&T Communications. 
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APPENDIX A 
Proof of (S, S$) Optimality 

We start our inductive proof by showing that an (S, S)-type policy 
is optimal and by constructing the critical thresholds if N is equal to 


one, ie., the last year, N — 1, is, in fact, the only year of the planning 
horizon. 


A.1 The single-stage problem 


For economy of notation, we shall drop the index, N — 1, from our 
equations. From (15) we wish to find an optimal policy, u* = u*(T), 
that satisfies 


Etg(d, T, u*)} = min Eig(d, T, u)}. (54) 


Equivalently, from (4) through (7), assuming T trunks in service at 
the beginning of the year, we seek a planned trunk level y*(T) = T + 
u*(T’) that minimizes the single-year cost functional: 


a(y — T) + aly — T) + asy 

+ asE max(0, d — y), y= T 
by — T) — bo(y -— T) + asy 

+ a,E max(0, d — y), ys T. 


d(T, y) = (55) 


Note that when u = 0, the two branches of (55) are identical. 
As we stressed in Section 3.1, to find an optimal solution of the 
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single-stage problem described by (55), we shall show that the first 
differences of J,(T, T + u) and J.(T, T + wu) satisfy conditions (1) 
through (3) of Section 3.1. Thus, 


Adi(y) = (a, + a2 + a3 — ay)m 
+ a,E|[max(d, y + m) — max(d, y)] 
Ado(y) = (b; — be + a3 — a4)m 
+ a,E|[max(d, y + m) — max(d, y)]. (56) 


The second term in (56) represents the expected savings in demand 
servicing if one additional trunk module is planned. This savings will 
occur with probability 1 — F(y). Thus, 


Adi(y) = (a, + a2 + a3)m — ag(1 — F(y))m, 
Ado(y) = (b; — be + as)m — ag(1 — F(y))m. (57) 


From (57), condition (1) and (2) of Section 3.1 are satisfied. Therefore, 
we obtain the minimum points of J;(T, x) and J2(T, x) for all modular 
x by applying (25). From (57) S and S are the smallest values of x on 
the discrete set M such that 


Qa, + Ap + Az 


F(x) =1- 
a4 
ad F(z) =1- as | (58) 
4 


Then, from inequalities (8) through (10), S and S satisfy (8). 
Consequently, the optimal single-stage decision for augmenting or 
disconnecting trunks in the unconstrained case is given by (3.1) and 
S and S are defined by (58). In the presence of capacity constraints, 
we need to modify S and S only by (26). 


A.2 The multistage problem 

In this section, we prove by induction the optimality of an (S, 8)- 
type policy for stage k by showing that the two branches of the cost- 
to-go function J;(T(k), y(k)) satisfy conditions (1) to (3) of Section 
3.1. 

To simplify the recursion for J,(T(k), y(k)) we introduce the auxil- 
iary function W,(y): 


W.(y) = aty + pVasly). (59) 


From the optimality principle and (59), the cost-to-go function from 
state T(k) can be expressed by 
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(ak + ak + a3 — at) y(k) — (at + a3)T(R) 
+ Pa W,(max(d(k), y(k))), 


y(k) = T() 
J(T(k), VR) = + (oF — bb + af - ab) y(k) — GE - BHT(R) (60) 
+ E Welmax(d(k), »(R))), 


y(k) = T(R). 


From (15) and (59), W;_1() satisfies the recursion 
W,-1(y) = min E lary + p[ci(u(k)) + c3(u(k)) 


u(k) 


+ af(y + u(k)) — ai(y + u(k)) + W(max(d(k), y + u(k)))}}, (61) 


where k= N-1,---, 0. 

As in Section 3.1, we consider the two branches of J;(T(k), y(k)), 
that is, J,i(T(R), y(k)) and Jpo(T(Rk), y(k)). Then, we need to show 
that J;,; and J. satisfy conditions (1) to (3). 

From the definition of J;,,; and Jz. and (60), condition (1) is trivial. 
To demonstrate (2) we have to prove that for any k 


AH,(x) = E W,(max(d(k), x + m)) — E W,(max(d(k), x)) 
d(k) d(k) 


is a nondecreasing function of x. First, we consider the case k = 
N-1. 

We shall approach (2) by studying the properties of Wy-(x) and 
applying standard convexity results. In particular, since Wy-i(x) and 
max(d, x) are monotonically increasing functions in x with nondecreas- 
ing first differences, the composite function Wy-i(max(d, x)) must 
also be an increasing function in x with nondecreasing first differ- 
ences.” In addition, the monotonicity of the functions Wy_;(max(d, x)) 
and AWy-(max(d, x)) is preserved by the expected value operation. 
Thus, condition (2) is satisfied for k = N — 1. 

To prove condition (2) inductively for an arbitrary k we have to 
show that recursion (61) preserves monotonicity of W;() and AW;(y). 

First, assuming the monotonicity of W,(y) and AW;(y) we obtain 
(in a fashion similar to that for the case k = N — 1) that the composite 
function W;[max(d(k), y + u(k))] has the same properties in y. Because 
of linearity in y of the remaining part of the right-hand side of (61) 
and the cost-of-money assumption, p < 1, the expression under the 
expected value sign in (61) is the sum of increasing functions in y with 
nondecreasing first differences. Thus 


W,-1(y) = min E f(y, d(k), u(k)), (62) 
u(k) d(k) 
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where f(y, d(k), u(k)) is increasing in y and Af(y, d(k), u(k)) is 
nondecreasing in y. 

Second, the monotonicity of f(.) and Af(-) in y implies the mono- 
tonicity of the expected value. Thus, from D. Dantizig’s convexity 
preservation result, it follows that the minimum of the expected value 
of f(-) is also a convex sequence in y, that is, the first differences are 
nondecreasing in y. Also, the monotonicity of f(-) is preserved by 
the minimum (infimum) operation. Indeed, if an arbitrary function 
f(y, z) is increasing in y for each z, then for any y,; < yo, and z, we 
have 


inf f(y, 2) S f(y, 2) = f(y, 2) 


and, therefore, 
inf f(y1, z) = inf f(y, 2). 
For completion of the proof, we need to demonstrate (8), i.e., that 
in the case with no capacity constraints, y(k) = ©, the minimum 


points of J;,; and Jzo, S(k) and S(R), respectively, are finite and satisfy 
the condition 


0 < S(k) < S(k) < &. (63) 


Calculating the first differences of J, and J,2, we obtain 
AJa(y(k)) = [at + a3 + af — af(1 — F,(y(k))]m 
+ death Vi+i(max(d(k), y(R))), 


AJo(y(k)) = [bt — b§ + a3 — ak(1 — F,(y(k)))]m 
+ pA E Virs(max(d(k), y(R))), (64) 
where the first differences AEV;,, are taken with respect to y(k). 
Since F(x) > 1 as x — ©, it follows from (64) that 
lim Adpi(x) = [af + a& + af]m — p[bi*! — bk*")m, 
lim Ady2(x) = [bf — b§ + af]m — p[bz*! — be ]m. (65) 
Relationship (65) shows that for sufficiently large x, both expres- 
sions are necessarily positive. Therefore, S(k) and S(k), which denote 


the smallest elements of the discrete set M = {0, m, 2m, ---}, such 
that 


Ad 1 (x) >0O and Ady,2(x) =>0 (66) 
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for (64), are finite. In addition, since a? + af > b{ — b, it follows that 
Adzi(x) = Jp2(x) for all x; hence, 


S(k) < S(k). 


Thus, if y(k) is equal to infinity, then an (S, S)-type policy is optimal 
for the year k. We note that convexity preservation arguments hold for 
any region on which our sequences are defined. Consequently, in the 
constrained case, an optimal solution is given by (26). 

The proof for the multistage problem is complete. 


APPENDIX B 
Derivation of the Recursion 

To obtain explicit formulas for calculating AJ;,; and AJ;,2, we shall 
use several recursions derived in Appendix A. In particular, since the 


optimal policy for stage k + 1 is described by (S(k + 1), S(k + 1)), 
applying u*(k + 1) in (60) we can rewrite eq. (15) as 


(aft) + aft! + aft! — aft!) S(k + 1) 
— (ak*) + ak*)T(k + 1) 
+ Wyiii(max(d(k + 1), S(k + 1))) 
if T(k+1)< S(k+1) 
ak T(k + 1) — af T(k + 1) 
+ Wrai(max(d(k +1), T(k+1))) (67) 
if S(k+1)< Tkk+1)<S(k+1) 
(bet! + BE! + gt! — abt!) S(k + 1) 
— (by*! — bs")T(k + 1) 
+ Wrsi(max(d(k + 1), S(k + 1))) 


if T(k +1) = S(k + 1), 


Vie(T(R + 1)) = 


where the expected value is taken with respect to Frii, the demand 
distribution at year k + 1. Note that at the boundary (S(k + 1) and 
S(k + 1)) the two corresponding branches of (67) are identical. Now, 
we shall carry (67) one step backward. Thus, we replace T(k + 1) by 
the trunk group dynamics equation and consider that T(k) [rather 
than T(k + 1)] is fixed. To simplify the notation we replace T(k) + 
u(k) by y. In the rest of the section our objective is to obtain a recursion 
for 


A E Vpsi(max(d(k), y)) 
d(k) 
= E Viz4:[max(d(k), y + m)] — E Vesi[max(d(k), y)]. 
d(k) d(k) 
From (67) we obtain 
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— (a¥*! + af*!)mF;(y) 
if y<S(k+1) 
fast? — ak) + aft Fiai(y) mF; (y) 


tah? E Vn+o{max 


d(k) d(k+1) 


A E Vp4i1(max(d(k), y)) = (68) 
-[d(k +1), max(a(h),y)}}} 
if S(k+1)sysS(k+1) 
— (bf*? — b5*!)mF,( y) 
if y>S(k+1). 
First, we note that 
AVs(max(d(h), 9)) = ee fie, 62 
From (69) it follows that 
AE Vess(max(d(h), y)) = Fil y)AVesi(y). (70) 


Second, we set R = max[d(k), d(k + 1)]. Then 
AVp+2{max[d(k + 1), max(d(k), y)]} 


= AVp+2{max(y, R)} = oe ‘A a4 (71) 


Taking expectation with respect to the demand distribution F;,,, we 
obtain 


A&B. Visalmax(y, B)} = ne: eS 


Consequently, 


AE | E Vevotmax(y, mi| = F,(y)Froi(y)AVevo(y). (73) 


d(k) {d(kt+1 


Applying (70) and (73), we arrive at 


—(af*1 + ak)m if y< S(k + 1) 
[a3*? — af*\(1 — Frir(y))|m 
AViw(y) = + pFryii(yAVerely) (74) 
if Sikk+1)sy<S(k+1) 
—(bi*! — bk)m if y= S(k + 1). 
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Finally, we note that because of (70), (74) gives the desired recursion: 
a Vi+i(max(d(k), y)) 


—(a**} a ak*!)m 
if y< S(k + 1) 
[a** — af'(1 — Frir(y))]m 
=F,(y)-2 9 + pA E Vesetmax(d(k + 1), y}} 


if S(k+1)sy<S(k+1) 
—(b**} 3 b&*1)m 
if y= S(k+ 1). 


APPENDIX C 
Computing Demand Servicing Level 

As we discussed in Section 3.3, to solve the TIP problem described 
by (11) through (13) we need to learn how to compute the expected. 
level of demand servicing for a given (S, S)-type policy =. 

For a given z, the planned trunk level y(k) = T(k) + u(k) is 
a random variable that depends only on the previous demands 
d(0), ---, d(k — 1) and is independent of the future demands 
d(k), ---, d(N — 1). Accordingly, for the expected level of demand 
servicing corresponding to 7, we get 


E{max(0, d(k) — y(k))} 
= Eyy){E{max(0, d(k) — y(k)) | y(k)}} 


= i [J (y- dP) dG,(x), (75) 


where G;(x) is the distribution function of the planned trunk level 


y(R). 
By the definition of the optimal policy u*, the distribution function 
G, of T(k) + u*(k) is 


G.(x) = P(y(k) Ss x) = P(T(R) + u*(k) S x) 


fo if xS<(k) | 
= P(T(k) sx), if S(k) sx<S(k) (76) 
Fe if x= S(k) 


fork=0,1,---,N-1l. | 

Using the fact that, for a given policy z, the planned trunk level 
y(k) is independent of the demand at that year, d(k), and our as- 
sumption that d(k — 1) is independent of d(0), d(1), --- , d(k — 2), 
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we can derive a simple recursive formula for G;(x) by calculating 
P(T(k) S x)): 


H,(x) = P(T(k) = x) 
= P{max(d(k — 1), y(k — 1)) S x} 
= P(d(k — 1) S x)-P(y(R — 1) S x) 
= Fy-1(x) - Ge-1(x) 


0, ifx<S(k-1) _ 
= Fya(x)-4 Healx), if S(k—1)<x<5(k-1) (77) 
1, if x = S(k — 1), 
fork =1,2,---,N—1. 


Recalling that, at the beginning of the first year, the number of 
trunks in service, T(0), is specified, we have 


_ J0, x < T(0) 
Ho(x) = 1° x= T(0). (78) 


Formulas (76) to (78) together with expression (27) specify the 
forward recursion for calculating the expected demand servicing level 
at year k (k = 0, --- , N — 1) associated with the policy z. 

Note that using similar independence arguments we can derive 
forward recursions for calculating other quantities of practical interest 
such as the total expected cost of trunk provisioning over the planning 
horizon, the difference in capital cost for two competing (S, S)-type 
policies, or the probability of demand servicing. The last quantity is 
another important measure of demand servicing activity, since it 
allows us to predict the portion of only-route groups that will require 
emergency servicing in a given year. 

To calculate the probability of demand servicing, for example, we 
observe that for a given policy z the planned trunk level y(k) depends 
only on the previous demands d(0), --- , d(k — 1). Thus we obtain 


Pid(k) > y(k)} = if [1 — Fi,(x)]dG,(x), 


where G;(-) is the distribution function of y(k). Integrating by parts 
and replacing G, by H;, we arrive at 


P(d(k) > y(k)) = H,(x)dF,(x) + 1 — F,(S(k)), 
k=0,1,---,N-1. 
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This paper presents an analysis of a multichannel Time Division Multiple 
Access (TDMA) blocking system. Such a system is of interest for real-time 
voice-traffic applications. The effects of different traffic-assignment algo- 
rithms, traffic loads, number of channels, number of time slots, and number 
of traffic nodes on system performance are studied, where performance is 
measured by the probability that an incoming message will be blocked. An 
approximate analytical solution is found, the results of which compare exceed- 
ingly well with results obtained from computer simulation. Also derived is a 
rigorous lower bound on the blocking probability. Collectively, these results 
indicate that, for most systems of interest, blocking probability is insensitive 
to the assignment algorithm used. The performance of an assignment algo- 
rithm that is simplest to implement is therefore nearly optimal. 


I. INTRODUCTION 


A multichannel Time Division Multiple Access (TDMA) protocol 
provides an efficient means of sharing a high-capacity communication 
channel among a network of users. In a multichannel TDMA system, 
the aggregate channel capacity is partitioned in both the frequency 
and time domains. Each of several channels has some fraction of total 
bandwidth and consists of a series of time slots. A fixed number of 
channel time slots are combined to form the TDMA frame. There is 
an extensive literature describing and analyzing this protocol, espe- 
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cially in the context of scanning beam communication satellite sys- 
tems. 

The sequence of slot-by-slot switching configurations, which de- 
scribes the origin and destination nodes of the traffic links assigned 
to each channel and time slot, is called a traffic assignment. Much 
attention has been focused on the problem of designing efficient 
traffic-assignment algorithms for the static case, i.e., where the assign- 
ment schedule does not change from frame to frame. However, because 
messages originate at random times and are of random duration, such 
a static assignment can be wasteful of bandwidth, since a time slot is 
unused during idle periods. 

To overcome the inefficiency of a static assignment, a network 
controller can allocate channels and time slots to the traffic nodes 
according to instantaneous traffic needs. In this scheme, the switching 
configuration may change from frame to frame. This dynamic assign- 
ment of channel capacity is called Demand Assignment TDMA (DA/ 
TDMA). 

This paper presents an analysis of a multichannel DA/TDMA 
protocol. We consider a blocking system in which incoming traffic 
that cannot be immediately serviced is blocked (i.e., turned away). 
Only one type of incoming traffic is considered. Thus, this model is 
appropriate for a voice-traffic system. 

We compare the blocking probability obtained using an optimal- 
assignment algorithm, which allows a complete reconfiguration of the 
switching pattern in each frame, with the blocking probability obtained 
using a fixed-assignment algorithm, which allows no rearrangement 
of existing traffic; i.e., in the fixed-assignment case, a message that 
requires more than one frame for transmission occupies the same 
channel and time slot in each frame. Notice that both the optimal- 
and fixed-assignment algorithms are more general than a static reser- 
vation scheme in which the switching configuration is the same in 
each frame. A tight lower bound on blocking probability obtained 
using an optimal-assignment algorithm is derived in addition to an 
accurate approximation for the blocking probability resulting from a 
type of fixed assignment called random assignment. Based upon our 
analytical results, we conclude that, for systems of moderate size, the 
blocking probabilities obtained using optimal- and fixed-assignment 
schemes are nearly identical. This is a significant finding since it 
implies that the complexity of optimal assignment is usually unnec- 
essary. 

The next section describes the multichannel DA/TDMA protocol 
and the traffic assignment problem. Section III contains a description 
of the network mathematical model used for the analysis, and a 
derivation of the equilibrium-state equations for the associated Mar- 
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kov process. Section IV computes the probability that an incoming- 
traffic request is blocked, and Section V presents comparisons of 
analytical with computer simulation results. 


It. THE MULTICHANNEL ASSIGNMENT PROBLEM 


This section describes the multichannel traffic-assignment problem. 
We start with a network consisting of communicating traffic nodes. 
The channel capacity in this case is partitioned both in the frequency 
and time domains. In particular, the bandwidth, B, of the transmission 
medium is divided into m channels, each having bandwidth B/m, and 
each channel consists of a series of time slots. A prespecified number 
of channel time slots are combined to form a transmission frame that 
continually repeats itself. The reservation multichannel TDMA pro- 
tocol under consideration assumes a network controller that assigns 
time slots to incoming-traffic requests on a noninterfering basis. 

Figure 1 shows one frame of a multichannel TDMA scheme with 
three channels and four time slots per frame. Each slot shows a traffic 
link assigned to that interval. The configuration shown in Fig. 1 will 
henceforth be referred to as a channel-time slot matrix. Denoting the 
number of channels by m and the number of time slots by c, this 
matrix will in general have m rows and c columns, and each entry will 
be a two-dimensional vector consisting of the transmitting and receiv- 
ing nodes. Notice that this multichannel technique assumes that each 
node can transmit and/or receive on any (single) channel during a 
given time slot, and that the channel on which a node is transmitting 
or receiving can change over successive time slots. Furthermore, we 
also assume that the assigned channel and time slot for a given traffic 
link may change from frame to frame. In particular, the traffic link 1- 
2 shown in Fig. 1 may be reassigned from channel 1/time slot 2 to 
some other channel and time slot in subsequent frames. 

The assignment of incoming traffic to available time slots is gov- 


4-3 2-1 1-3 3-2 
2-1 3-2 4-2 2-3 


3-2 1-3 3-4 


Fig. 1—One three-channel time division multiple access frame. 
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erned by the following fundamental constraint: one node may not 
transmit or receive on two different channels during the same time 
slot. In contrast to single-channel TDMA, in the multichannel case 
the network controller must not only determine whether a slot is being 
used, but must also determine whether a given traffic request can be 
assigned to that slot without violating the fundamental constraint. It 
therefore may not be possible to assign a given traffic request even 
though empty time slots exist. 

Given unlimited computational capability, it may be desirable to 
rearrange traffic already assigned to the channel-time slot matrix in 
order to accommodate a new traffic arrival. It is therefore of interest 
to know under what conditions a set of traffic requests can be assigned. 
Denoting the number of traffic nodes by n, we define the n X n traffic 
matrix T as the matrix whose (i, /)th element contains the number of 
traffic-units node 1 is transmitting to node j (each traffic unit repre- 
sents one packet and is assigned to one time slot). Given a traffic 
matrix, T, and an empty channel-time slot matrix, all of the traffic 
can be assigned without violating the fundamental constraint if and 
only if the following matrix constraints are satisfied:" 


T1 <cl, (la) 
ess el (1b) 

and 
1’T1 S me, (1c) 


where 1 is an n-dimensional vector whose elements are all unity, m 
denotes the number of channels, c denotes the number of time slots, 
and prime denotes transpose. These equations imply that no trans- 
mitting node requires more than c time slots, no receiving node requires 
more than c time slots, and that the total traffic demand is less than 
the total number (mc) of available time slots. To check whether it is 
possible to assign a new traffic request, one therefore need only check 
the matrix constraint set (1) where T contains traffic already assigned 
in addition to the new traffic request. 

Because traffic requests are time varying, of ultimate interest is how 
to assign incoming traffic dynamically so as to minimize the number 
of times a new traffic request cannot be assigned without rearranging 
assigned traffic. This problem is quite difficult and has not yet been 
addressed in the open literature. The difficulty arises from the fact 
that future traffic requests are unknown, and hence, given any assign- 
ment rule, it is always possible to receive a sequence of traffic requests 
such that one can be assigned only if existing traffic is rearranged. 
Given a matrix T such that the constraint set (1) is satisfied, however, 
the static problem of efficiently assigning all of the traffic in T to an 
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empty channel-time slot matrix has been addressed in Refs. 6 and 8 
through 10. We also point out that these methods do not result in a 
unique assignment. 

If we assume that a total rearrangement of existing traffic is allowed 
at the time each traffic request is made, then an optimal traffic 
assignment scheme can be inferred from the matrix constraint set (1). 
(“Optimal” in this case means that the probability of not being able 
to assign a new traffic request is minimized.) In particular, each time 
a new traffic request is made, the traffic matrix constraint set (1) is 
checked. If they are satisfied, then a “brute force” method for assigning 
the new traffic request would be to “empty out” the existing channel- 
time slot matrix and reassign all of this traffic along with the new 
traffic request via one of the methods suggested in Refs. 6 and 8 
through 10. Certainly, this scheme requires much more computational 
power than necessary. If a new traffic request cannot be assigned to 
existing empty time slots, it is likely that very few (i.e., one or two) 
existing traffic assignments would have to be rearranged in order to 
assign the new traffic request. 

If traffic assignments are to be made real time, as in a satellite 
system, the complexity of the assignment scheme becomes a crucial 
issue. The brute force optimal assignment scheme previously described 
would optimize system performance; however, under moderate to 
heavy loads, this scheme is likely to be impractical so that simpler 
assignment schemes, which yield suboptimal performance (i.e., a 
higher blocking probability), must be used. This raises the question of 
how much performance degradation is caused by using simpler assign- 
ment schemes rather than an optimal assignment scheme that permits 
an unlimited number of rearrangements. 

The approach taken in this paper is to compare analytically the 
blocking probability obtained for a given system using an optimal 
assignment algorithm with the blocking probability obtained using 
fixed-assignment algorithms, which allow no rearrangement. Under a 
fixed-assignment algorithm, if a new traffic request cannot be assigned 
to a given channel-time slot matrix, it is blocked. Notice that to 
determine whether incoming traffic can be assigned when using an 
optimal-assignment algorithm, the traffic matrix must be examined, 
whereas when using a fixed-assignment algorithm, the channel-time 
slot matrix must be examined. 


II. MATHEMATICAL FORMULATION 
3.1 Traffic model 


The purpose of this subsection is to specify the mathematical model 
used to generate the analytical results in the following sections. The 
incoming traffic is modeled as the sum of independent Poisson proc- 
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esses flowing between each pair of traffic nodes. We therefore have an 
arrival rate matrix, A, whose (i, j)th element is the Poisson rate at 
which messages are transmitted from node i to node j. The flow rates 
for traffic between each pair of nodes are assumed to be identical, i.e., 
A = )‘11’ where 0’ is a constant. The total traffic arrival rate, i, is 
therefore the sum of the rates between each pair of nodes. 

Because the traffic out of each node is assumed to be independent 
of the traffic out of all other nodes, given a new traffic request, the 
probability that the request originated from a specific node i is 1/n 
and, similarly, the probability that the destination is node j is also 
1/n. Because the transmission processes between each pair of nodes 
are independent, the probability that a given traffic request originates 
from node i and is sent to node j is 1/n”. 

If a traffic request can be assigned, it occupies one slot of the 
channel-time slot matrix for a random amount of time, depending on 
the message length. This “service” time is assumed to be exponentially 
distributed. Associated with the incoming traffic is therefore the 
exponential service rate, u. For analytical convenience, we assume 
that this distribution is continuous in the sense that departures from 
the channel-time slot matrix can occur at any time instant. Notice, 
however, that for a real system, departures occur only at the end of a 
given frame. The corresponding service time distributions must there- 
fore be “discretized” since service times must be an integer numbers 
of frames. This effect becomes negligible, however, if the average 
service time consists of a large number of frames (i.e., >100). 


3.2 Derivation of equilibrium-state equations for optimal assignment 


The derivations of the state equations for the Markov processes 
associated with optimal and fixed assignments are virtually identical. 
We present, therefore, only the details of the derivation for optimal 
assignment and then indicate how to modify that derivation to obtain 
the state equations for fixed assignment. 

When an optimal assignment algorithm is used, the traffic matrix 
T determines whether an incoming traffic request is blocked. Because 
the traffic arrival process is Poisson and the service times are expo- 
nential, the evolution of the traffic matrix T is described by a Markov 
process. The set of states for the Markov process associated with 
optimal assignment is, therefore, the set of n X n matrices T with 
integer elements that satisfy the constraint set (1). The following 
notation is helpful in defining the transition rates for the process: 


S = {T|T satisfies (la) through (1c)} (2a) 
S, ={T|T €S, 1’T1 = k} (2b) 
Sz = {T*|T* € S, A(i, J) such that T* = T - ee/} (2c) 
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Sp = {T*|T* € S, A, j) such that T* = T + ejej}, (2d) 


where e; is the n X 1 vector with 1 in the ith place and 0 elsewhere. 
The set S;, is the set of all states such that there are k messages in the 
system; Sz and S7 are the sets of states into which the process makes 
transitions via departures and arrivals, respectively, given that the 
process is in state T. Notice that if T* € Sz, then T € S% and, 
similarly, if T* € S>, then T € Sy. 

Let r(T!, T’) be the transition rate’ between any two states T’ and 
T?. Notice that a transition from state T’ to a state T’ can occur only 
via a single arrival or a single departure. The rate r(T’, T’) is, 
therefore, nonzero if and only if T' = T’, or T’ and T” differ from 
each other in exactly one element and that difference is one. Thus, 


thu, if T?e€ Sp 
\/n?, if T?e€ Sh 


U,T)=|1-see-Sish, if T=T @) 
1,J 
0, otherwise, 


where T’ and T’ differ in the (i, j)th element, tj; is the (i, j)th element 
of T’, 1/u is the mean service time for messages, \/n? is the expected 
rate at which messages between nodes 1 and j are generated, and |A| 
denotes the number of elements in the set A. 

Because the performance of a TDMA assignment algorithm is 
measured in terms of steady-state network behavior, we focus on the 
limiting steady-state distribution of the state variable T. To prove 
that the steady-state distribution exists, note that the state space 
consists of matrices with integer elements that satisfy the constraint 
set (1), and it is therefore finite. The transition rates given in (3) do 
not depend explicitly on time, so the process is time homogeneous. 
Finally, it can be shown the process is irreducible and contains no 
periodicities. Therefore, the steady-state distribution exists. Although 
we obtain a closed-form expression for the steady-state distribution 
in Appendix A, it can be explicitly computed only in very few cases. 
We therefore develop an alternative method of analyzing the system, 
which does not require the explicit formula. This approach, moreover, 
provides valuable insight into our problem and allows us to provide a 
unified treatment of both optimal- and fixed-assignment algorithms. 

The next step is to derive the equilibrium equations that the steady- 
state probability distribution p(T) must satisfy. Using the notation 
introduced earlier, we have the flow equation” 


tThe function r(- , -) is technically the infinitesimal generator of the Markov process 
associated with optimal assignment. 
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Z p(T)r(T, T’) = 2 p(T’)r(T’, T), TES. (4) 
TEs T’eS 
TT T+T 


The intuitive meaning of (4) is that for every state T, the probability 
flow rate out of T must equal the probability flow rate into T. 

The following lemma may be used to derive an alternative form for 
(4). Let |A| represent the number of elements in the set A and let 
p(f|T) denote the probability, conditioned on state T, that an arrival, 
chosen from the uniform distribution of origin-destination pairs, does 
not violate the set of matrix constraints; i.e., p({|T) is the probability 
that an arrival “fits” (can be assigned). Then we have 


Lemma 1: The probability flow out of state T € S satisfies 
x p(T)r(T, T’) Zz p(T)r(T, T’) + = p(T)r(T, T’) (5a) 
T+T T’ESZ T’ES} 


T’eS 
= wkp(T) + rp(f| T)p(T), (5b) 
where 1’T1 = k, which equals the number of occupied time slots, and 
SH 
p(fiT) = 2 at (6) 


The proof of this lemma appears in Appendix B. 

This lemma shows that the flow out of any state T is composed of 
two terms: the first term in (5b), ukp(T), reflects transitions that 
occur because there is a departure, while the second term, 
\p(f|T)p(T), reflects transitions that occur because an admissible 
request is generated. 

Applying Lemma 1 to the left side of (4) and noting that, if 
T’ # T,r(T’, T) # 0 if and only if T’ € Sz or T’ € S#, we obtain 
the flow equation 


ukp(T) + Ap(f|T)p(T) 
= wees p(T’)r(T’, T) + west p(T’)r(T’, T), TES, (7) 


Equation (7) describes the probabilistic flows into and out of each 
state. In principle, (7) can be solved to yield a closed form solution for 
the steady-state distribution of the state variable, p(T) (see Appendix 
A). Unfortunately, it is very difficult to evaluate this expression for 
most cases of interest. In order to calculate the system blocking 
probability, Pg, however, it is unnecessary to evaluate the steady-state 
distribution of T. Jt is shown in the next section that it is enough to 
know only certain aggregate quantities depending only on the number 
of messages in the system, k. Notice that the “state” k is an aggregate 
state composed of all traffic matrices T € S,. We therefore are 
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interested in deriving an analogous equation to (7) that describes the 
steady-state flows between the sets of states S,, k = 0,1, 2, ---.A 
direct technique for obtaining this equation is to sum both sides of (7) 
over the set T € S,. 

Let po(k) be the steady-state probability that the state of the system 
is contained in S;, which is the probability that there are k occupied 
time slots; and let po(f|) be, in analogy with p(f|T), the probability 
of a fit given that the system contains k messages, 1.e., 


Polk) = 2% p(T) (8a) 
TES, 


x p(f\|T)p(T) 


_ TES, 
pol fk) = ee 


where the O subscript indicates that an optimal assignment algorithm 
is assumed. The following theorem gives the flow equations for the 
distribution po(k). 
Theorem 1: The probabilities { po(k)} satisfy the equations 
[uk + Apo(f| k) pol) 

= u(k + 1)pol(k + 1) + Apol(f|k — Vpolk — 1), (9) 


where po(k) = 0 fork <0 ork > me. 


(8b) 


The proof of this theorem is given in Appendix C. 

Equation (9) expresses the equality of probability flows between the 
sets S,. Given that the system is in state T, the next transition can 
only be into a state T’ € Sz or T’ € S%. Suppose, for example, that 
T € S,; then T’ € S;_; or T’ € Sp41; i.e., transitions out of any state 
in S; are always into a state in S;_; or S41. Note that (9) is a set of 
“birth-death” equations, which we will solve in Section IV to give an 
expression for { po(k)} in terms of X, uw, and { po(f| k)}. (These equations 
do not, however, imply that the k process is Markov.) In contrast to 
the solution of (7) we show in Section IV that it is possible to obtain 
a close approximation to the solution of (9) that can be easily evalu- 
ated. 


3.3 Equilibrium state equations for fixed assignment 


To derive the flow equations [corresponding to (5), (7), and (9)] for 
the Markov process defined by a fixed-assignment algorithm, we 
redefine the set of states as the set of channel-time slot matrices M 
defined in Section II. The following notation is analogous to (2): 


L = {M|M is an admissible channel-time slot matrix} (10a) 


L, = {M|M € L, M contains k units of traffic} (10b) 
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Im = {M*|M* € L, A(i — J) and (m*, c*) such that M* 

= M-— (i — j)emec} (10c) 
Im = {M*|M* € L, A(i — j) and (m*, c*) such that M* 

=M+ (i -— j)emec}, (10d) 


where (i — j) represents one unit of traffic from node to node J, em« 
is the m X 1 vector with 1 in the m*th place and 0 elsewhere, e,« is the 
c X 1 vector with 1 in the c*th place and 0 elsewhere, and the notation 
M* = M + (i — j)@m*ece means that the channel-time slot matrices M* 
and M differ from each other only in the (m*, c*) position. A + sign 
indicates that M* contains the traffic pair (i — j) in the (m*, c*) 
position, whereas the (m*, c*) position in M is empty; the — sign 
indicates that M contains the traffic pair (t — j) in the (m*, c*) 
position, whereas the (m*, c*) position in M* is empty. 
Let r(M', M7?) be the transition rate between any two states M! and 

M’.* Then we have 

Ll, if M?€ Ly 

\(M?!, M?), if M’? € Lin 
r(M!, M?)=|1-—|M'|n»— > A(M', M’), if M'*=™M’ = (11) 

MELi 


0, otherwise, 


where | M?| is the total number of traffic pairs in M’. The transition 
rate \(M’, M7”) is the rate at which a new arrival (i — j) is assigned to 
M!, thereby changing M! to M2. Let a“? denote the number of slots 
in M? into which the arrival (i — j) can be assigned. If assignments 
are made by selecting one out of the avy? available slots from a 
uniform distribution, then 


nN 
\(M’, M?) = eat 


where J is the total traffic-arrival rate. This fixed-assignment scheme, 
where traffic arrivals are randomly assigned to available time slots 
according to a uniform distribution, will henceforth be referred to as 
random assignment. It is not necessary, however, to assume random 
assignment in order to derive the flow equation that follows. 

We denote the steady-state probability distribution of the Markov 
process for fixed assignment as p(M). The existence proof for p(M) is 
the same as that for p(T). Also, let p(f|M) denote the conditional 
probability, given that the state is M, that an arrival chosen from the 


*The function r(- , -) is the infinitesimal generator for the associated fixed assign- 
ment process. 
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uniform distribution of origin-destination pairs can be assigned with- 
out reassigning any existing traffic, i.e., the probability of a fit under 
the fixed-assignment rule. Finally, we define 


Am = {(i — j) lay” > 0} (12) 


as the set of all traffic pairs which can be assigned to M. 

Proceeding as in the case of optimal assignment, we can derive the 
flow equations for the fixed-assignment case by merely changing 
notation. The probability of a fit, p(f|M), in this case is simply the 
number of traffic pair arrivals that can be assigned under the fixed 
assignment rule divided by the total number of possible pairs. Thus, 
in analogy with (6) we have 

| Am| 


pP(f|M) = 2? (13) 





where Am is defined in (12). In analogy with the flow equations (7) 
and (9) for optimal assignment, the fixed-assignment flow equations 
are 


ukp(M) + dp(f|M)p(M) = 2 _ p(M")r(M', M) 


+  p(M’)r(M’, M), MEL, k=1,---,me, (14) 
M’ELiy 


and 
[wk + Apr(f|R)|pr(k) = wR + 1)pr(k + 1) 
+ Apr(f|k — 1l)pr(k — 1), k=1,---,me, (15) 
where 
Prk) = a p(M) (16a) 


2, PCF M)p(M) 
pr(k) : 


and the F subscript indicates that a fixed-assignment algorithm is 
assumed. Note that (15) is, like (9), a system of birth-death equations. 


Pr(f|k) = (16b) 


IV. PERFORMANCE ANALYSIS 
4.1 Properties of blocking probabilities 

The performance of a multichannel TDMA blocking system is 
measured in terms of the steady-state probability that an arrival is 


blocked or lost. Of course, this blocking probability depends on the 
assignment algorithm. Conditioned on the state of the system, the 
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blocking probability is simply 1 — p(f|T) or 1 — p(f|M), for optimal 
or fixed assignment, respectively. The unconditional blocking proba- 
bility is given by 


1-— PZ= Js P(f| T)p(T) (17) 


or 


1—- Ps= nz, POF M)p(M), (18) 


where P$ and P§ are blocking probabilities using respectively optimal- 
and fixed-assignment algorithms. As discussed in Section III and 
Appendix A, it is extremely difficult to compute p(T) for most systems 
of interest, and hence (17) and (18) cannot be used directly to compute 
P% and P§. In this section, however, we derive bounds and approxi- 
mations for (17) and (18) by using the aggregate-state equations (9) 
and (15). This provides a means of quantitively comparing assignment 
algorithms and estimating the performance of the system as parame- 
ters such as n, m, and c vary. 

We obtain equivalent expressions for the blocking probabilities as 
follows: 


1-PZ= 2% p(f|T)p(T) 
= 2 = pf|T)p(T) 
=1 TES, 


= 2, pol fl R)polh), (19) 


where po(f|) and po(k) satisfy (8); similarly P§ satisfies 
1- P5= > pr(flP)pr(h), (20) 


where pr(f|k) and pr(f) satisfy (16). 
Equations (19) and (20) are particular cases of the foNowing general 
expression for blocking probability; 


1—Pp= 2 dep(h), (21) 


where ¢; and p(k) satisfy the birth-death system of equations 
[uk + Adi ]p(k) = w(k + 1)p(k +1) + Adenp(R-—1), (22) 


subject to the boundary conditions ¢,,, = 0 and p(k) = 0 for k < 0 or 
k> me; and >; p(k) = 1. Solving (22) and substituting into (21) yields 
an expression for the blocking probability in terms of the ¢;’s; i.e., 
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DR) =— 3 (23) 
and from (21) 


Ez = 
ee ee 24) 
mOql Geo 


where p = A/u. If dx = polflR) or o, = pr(f|k), then Pz = P§ or 
Pz, = P%, respectively. Another case of particular interest is where 
op = 1 if O S k < mc and ¢, = O otherwise. We denote this set of 
probabilities, which corresponds to the mc server Erlang-loss system, 
as pr(f|k). The expression for blocking probability, P%, is called 
Erlang’s loss or B formula. The Erlang formula applies to a single- 
channel TDMA system where only the constraint (1c) in (1) must 
hold. An important property of Pg stated in the following lemma, 
which may be used to derive bounds on assignment algorithm perform- 
ance, is that Pg is a monotonically nonincreasing function of ¢; for 
k=0,1, ---, me. 


Lemma 2: The blocking probability Pg = Pp(¢o, --+, dmc) satisfies 
0Pz 
— <0, k=0, ---, me. 25 
Obs (25) 


The proof of this lemma appears in Appendix D. 

It is intuitively reasonable that the blocking probability should 
increase as more system constraints are added. This is indeed the case, 
as the next theorem shows. 


Theorem 2: The optimal assignment, fixed assignment, and Erlang 
blocking probabilities satisfy 


P§ < P$< P§. (26) 


Proof: If the system is in state k, then an arrival can be assigned by a 
fixed-assignment algorithm only if it can be assigned by an optimal 
algorithm and so, for all k, we have pr(f|k) < po(f|). It is clear that 
Po(f|k) < pe(f|k), for all k. Therefore, (25) implies (26). Q.E.D. 

As the ratio of the number of nodes n to the number of channels m 
increases, it becomes less likely that an arrival will match any of the 
traffic pairs already assigned to a given time slot. Consequently, for 
large n/m, we expect that if there is an open slot, it should be relatively 
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easy to assign a new traffic request. To be more precise, as n/m 
increases, the system performance approaches that of an Erlang sys- 
tem. 


Theorem 3: lim P§ = lim P2 = P&. (27) 


n n 
= 00 — 0 
m m 


Proof: Theorem 2 implies that it is sufficient to show that P§ — P%. 
For all k < (mc — 1) we have pr(f|k) = pr(f|mc — 1). Because the 
nodes are symmetric, the probability of a fit given one empty slot is: 


pr(flme — 1) = (net), 


which is the probability that the origin and destination nodes of an 
arrival chosen from a uniform distribution do not match (m — 1) 
randomly chosen origin-destination pairs. Thus, we have 


1, k< 
lim pr(f|R) = i eas 


—— 00 
m 


which is pz(f|k). Because (24) is a continuous function of the ¢,’s, we 
have limy/m—oP% = PR. Q.E.D. 

Although it is easy to compute the Erlang lower bound P%, it is 
extremely difficult to calculate P3 or P§. The difficulty lies in the 
calculation of pr(f|k) and po(f|k) as functions of k. In particular, 
notice from (8b) and (16b) that the state probabilities p(T € S,) and 
p(M € L;) must be known. The aggregate flow eqs. (9) and (15) have 
therefore not made the exact computation of P3 and P§ any easier. 
However, the advantage in using these equations is that we can derive 
an accurate approximation for pr(f|k), thereby enabling the calcula- 
tion of an approximation for the corresponding blocking probability. 


4.2 Approximate calculation of blocking probability with random 
assignment 

In order to approximate the blocking probability resulting from 
fixed assignment, we first approximate pr(f|k) and subsequently 
substitute the resulting expression for ¢;, in (24). Let k = (ki, ---, Rk) 
be the vector of time slot occupancy; i.e., there are k; units of traffic 
in the ith column of the channel-time slot matrix. The probability of 
a fit can be expressed as 


Pe(f|k) = 2% pr(flk)pr(k IR), (28) 


where Q, is the set of all occupancy vectors k such that 2;k; = k, 
Pr(f|k) is the conditional fit probability given that the occupancy 
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vector is k, and pr(k|k) is the conditional probability that the occu- 
pancy vector is k given there are k units of traffic in the system. 

To calculate pr(f |), note that: (1) the k; traffic units in time slot i 
are characterized by k; pairs of integers between 1 and n, which satisfy 
the fundamental constraint, and because the traffic between nodes is 
assumed to be symmetric, these integer pairs are equally likely; (2) if 
k; = m, then any new requests cannot be assigned to the ith time slot. 
We also use the following assumption; and (3) information about 
traffic in time slot 1 provides no information about traffic in a different 
time slot j (traffic in different slots is independent). This assumption 
is certainly very accurate for a large number of nodes; however, it is 
quite difficult to prove or disprove in general. An arriving unit of 
traffic can be assigned to time slot i if and only if its origin does not 
match any of the k; origins and its destination does not match any of 
the k; destinations of traffic already assigned to column 1. Observation 
(1) implies that this event has probability [(n — k;)/n]*. A unit of 
traffic cannot be assigned, i.e., does not fit, if and only if it does not 
fit into any time slot. Thus, using observations (2) and (8), we have 
that the probability of no fit is the product 


\2 
Bf) | 
fi] k;<m} n 


and hence, the probability of a fit, given k, is 
n — k;\" 
pr(f|k)=1- = a } (29) 


where the product is assumed to be one if {i| k; < m} is the empty set. 

Intuition suggests that given k units of traffic in the system, the 
occupied slots, when random assignment is used, are uniformly dis- 
tributed throughout the channel-time slot matrix. That is, 


¢ |™m 
II | R; 
1 


pr(k|k) = =, (30) 

mc 

k 
where the numerator is the number of ways to have k; units of traffic 
in slot 1 (i = 1, ---, c) and the denominator is the total number of 


ways to have k units of traffic in the system. The following example 
shows, however, that this intuition is misleading and (30) is not the 
correct distribution. 

Consider a system with n = 3, m = 2, c = 2. Figure 2 is a transition 
diagram illustrating transition rates into the states corresponding to 
k = 2. The ordered pairs, e.g., (1, 1), represent the occupancy vectors 
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(0,2); (2,0) 





Fig. 2—Transition diagram for example with two channels, two time slots, and three 
nodes. 


k for channel-time slot matrices containing two units of traffic, and q 
is the probability that an arriving unit of traffic can be assigned to the 
column containing the single unit already assigned. Using (30), it is 
easily verified that the uniformly distributed assumption implies that 


Prl(1, 1)|k = 2] = 2{prl(0, 2)|k = 2] + prl(2, 0)[k = 2]}. (31) 


It is easy to show by writing the flow equation and using the fact 
pr(k) = pr(k|k)p(k) that (31) can be satisfied only if the rate from 
k = 1 into (1, 1) is twice the rate into the pair of states (0, 2) and (2, 
0), iie., only if \ — Aq/3 = 2dq/3. But the preceding equation can be 
satisfied only if gq = 1, which is clearly not true for this system. 
Therefore, (30) is not correct and the assumption of uniformly distrib- 
uted occupancy is not strictly true. We may, however, approximate 
pr(k|k) by (30), which when combined with (28) and the expression 
for pr(f|k), (29), gives an approximation to pr(f|k) when random 
assignment is used. That approximation may, in turn, be used in (24) 
to provide an approximation to P§. 

Intuitively, one would expect that the approximation for pr(/|) 
given by (28) through (30) is accurate for the case of random assign- 
ment. In this case (30) becomes more accurate as the ratio of the 
number of nodes to the number of channels n/m increases. This is 
because the probability that a new arrival can be assigned to a 
randomly picked empty time slot increases with n/m. The selection of 
empty slots in the channel-time slot matrix to which new arrivals are 
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assigned therefore becomes less biased. The simulations in Section V 
indicate that in fact our approximation is extremely accurate for 
relatively small systems (i.e., four channels, five time slots, and ten 
nodes). Fixed-assignment schemes other than random assignment are 
possible, however, where the approximation (30) may not be accurate. 
For example, it may be desirable to pack the assigned traffic as closely 
as possible to the left or to the right of the channel-time slot matrix 
to increase the probability that some column has a relatively large 
number of empty slots. For this case, the distribution of slot patterns 
p(k|k) may be significantly different from the distribution resulting 
from random assignment. 

An upper bound on the blocking probability obtained using any 
fixed-assignment scheme can be derived in principle by calculating a 
lower bound on the fit probability, pr(f|k). From (28), 


Pr(f|k) = 2% pr(f|k)pr(E I) 


> min pr(f|k) = pH (IA). (32) 


A lower bound on pr(f|k) is therefore obtained by assuming that 
traffic already assigned is arranged in the configuration which mini- 
mizes the probability that a new arrival can be assigned. From Lemma 
2 and Theorem 2 we have that 


Pg — P§ S Ps[ pF (f| 1), On, realy pF’(f|mce)], (33) 


where the blocking probability, Pg, as a function of the fit probabilities 
is given by (24). Unfortunately, the expression for pr(f|k), given by 
(29), relies upon an independence assumption which has not been 
proven. Consequently, combining (24), (29), and (33) may not consti- 
tute a rigorous upper bound on the blocking probability. The derivation 
of a tight upper bound on pr(f|), and hence on the blocking proba- 
bility, P§, therefore remains an open problem. 

This completes the presentation of analytical results that can be 
used to evaluate multichannel TDMA performance using either opti- 
mal- or fixed-assignment schemes. To summarize, we have obtained 
an approximation for the blocking probability resulting from random 
assignment [given by (24), (28), (29), and (30)], and a lower bound on 
the blocking probability using optimal assignment (i.e., the Erlang 
blocking probability, P%). These quantities can be easily evaluated 
with the aid of a computer. In the next section we compare these 
analytical results with computer simulation results. 


V. NUMERICAL RESULTS 


The analytical results of the last two sections are now illustrated 
via some specific examples. Figure 3 shows plots of the probability 
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Fig. 3—Probability of a fit vs. number of occupied slots using random assignment for 
a system with four channels and five time slots. 


that a new traffic arrival can be assigned, given that there are k units 
of traffic already present in the channel-time slot matrix [ pr(/|F) 
given by (28) through (30)], vs. k for a system with four channels and 
five time slots per channel. Curves are shown for a system with 5 
nodes, 10 nodes, and 50 nodes. These curves approximate the proba- 
bility that an incoming traffic arrival can be assigned using random 
assignment. As the number of nodes increases, the curves converge 
rapidly to the single-channel (step function) case with nm time slots. 
The same set of curves computed for a system with 4 channels and 10 
time slots per channel were nearly identical to those shown in Fig. 3 
and are therefore omitted. 

It is reasonable to expect that if the fit probabilities shown in Fig. 
3 are close to the single-channel case, then the corresponding system 
blocking probabilities should also be close to the analogous single- 
channel blocking probability. This is indeed the case as illustrated in 
Figs. 4 through 7. In each case plots of blocking probabilities vs. 
normalized load for the single-channel case (Erlang B formula with 
nm servers), and for the multichannel case using random and optimal 
assignment are shown. The optimal-assignment curves were obtained 
by computer simulation. The random-assignment curves were ob- 
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Fig. 4—Blocking probability vs. offered load for a system with 4 channels, 5 time 
slots, and 10 nodes. 
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Fig. 5—Blocking probability vs. offered load for a system with 4 channels, 5 time 
slots, and 50 nodes. 


tained both analytically via the approximation described in the last 
section [(24) and (28) through (30)], and by computer simulation. In 
all cases the approximate analytical curves are nearly identical to the 
corresponding simulated curves. Figures 4 and 5 show plots for a 
system with 4 channels, 5 time slots per channel, and 10 nodes and 50 
nodes, respectively. Figures 6 and 7 show analogous plots for systems 
with 4 channels and 10 time slots per channel. 

Figures 4 through 7 indicate that the differences between the sim- 
ulation results, the analytical approximation, and the lower (single- 
channel) bound on multichannel blocking probabilities are significant 
only for systems with a relatively small number of nodes. For the cases 
shown here, the single-channel system exhibits at most moderate 
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Fig. 6—Blocking probability vs. offered load for a system with 4 channels, 10 time 
slots, and 10 nodes. 
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Fig. 7—Blocking probability vs. offered load for a system with 4 channels, 10 time 
slots, and 50 nodes. 
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performance improvements over the multichannel random-assignment 
case. Results obtained for additional cases indicate that if the ratio of 
nodes to channels (n/m) is greater than 10, the difference between 
blocking probabilities obtained using a multichannel system with a 
fixed-assignment algorithm and an analogous single-channel system 
is negligible. This condition is likely to be satisfied in many satellite 
systems, and hence we reach the important conclusion that, for prac- 
tical systems, the simplest of assignment schemes will perform nearly 
as well as an optimal-assignment scheme. 

We point out that the traffic model used here must be modified in 
order to study duplex voice traffic. In this case each traffic request 
from node i to node j also generates a simultaneous traffic request 
from node j to node i. This alternative traffic model does not require 
major changes in any of our previous arguments, and hence results 
obtained from using this model should correspond with those given 
here. 


VI. CONCLUSIONS 


This paper has provided tools with which to evaluate the perform- 
ance of multichannel TDMA blocking systems. For any multichannel 
assignment scheme (fixed or optimal), a lower bound on system 
blocking probability has been obtained along with an accurate approx- 
imation for the blocking probability resulting from random assign- 
ment. 

The numerical results in Section V indicate that multichannel 
blocking probability is relatively insensitive to the assignment algo- 
rithm used when a moderate number of nodes are present. If the ratio 
of the number of nodes to number of channels is 10 or greater, the 
difference between blocking probabilities obtained using a multichan- 
nel system with a fixed-assignment algorithm and an analogous single- 
channel system is negligible. This conclusion is fortunate since it 
implies that the performance of an assignment algorithm, which is 
simplest to implement, will be nearly optimal. 

The results in this paper pertain to networks which handle voice 
traffic only. Of equal interest are analogous results which apply to 
networks handling data traffic. Specifically, it would be useful to know 
whether the performance of a multichannel TDMA queueing system 
is also insensitive to the particular assignment algorithm used. This 
issue requires further investigation. 
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APPENDIX A 


Steady-State Distribution of the Markov Process Associated with Optimal 
Assignment 


To derive the steady-state distribution of the Markov process asso- 
ciated with optimal assignment, consider a set of n” independent 
M/M/~ queues, labeled by the indices (i, 7), with arrival rates \;; and 
mean service times jj’. Define the Markov process q; as the number 
of customers in the queue (i, 7). Then each process qj is a birth-death 
process with a steady-state distribution given by 


oo ae 
Pi (qi) = — (* eit, gy > 0. 
qi! \ui 
Thus, the matrix process Q = [q,] has the steady-state distribution 
given by 
1 /ry\" 
p(Q) = I —_ (4) erwri, — gy = 0. 


Now restrict the process Q to the set of states such that the matrix 
constraints (1) are satisfied, and set \4j; = A/n? and pw; = uw. The 
resulting process is the Markov process T defined in Section II. By 
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Corollary 1.10 in Kelly,” the steady-state distribution of T is 


ti 
Il i A ‘oN 
ij ty! \n? yu 


ee, 
an | ale A 
TES jj t;;! n?u 


where S is defined by (2a).* Note that (34) is the conditional probability 
distribution of the Q process, given that the state is contained in S, 
i.e, pD(Q|/Q € S). Evaluating the denominator in (34) requires an 
enumeration off all the states in S, a formidable task for moderately 
sized systems. 


p(T) = TES, (34) 


APPENDIX B 
Proof of Lemma 1 


This appendix contains the proof of Lemma 1. From (4), we have 


z p(T)r(T, T)= 2 p(T)r(T, T’)+_ = p(T)r(T, T’). (35) 
T’4T T’esp T’ES> 


Substituting (3) for r(T, T’) in the first term on the right yields 
x p(T)r(T, T’) =p(T) 2 r(T, T’) 
‘EST T’eS> 


T 


= p(T) a tu = p(T)pk, (36) 


where ); ; ti; = k is the total traffic in T. Similarly, for the second term 
on the right, 


S p(T)(T, T’) = p(T) S(T, 7) 
TESq+ T’ESq+ 
a 2 
> p(T) eee d/n 


= p(T)A|Sr+|/n’. (37) 


An arrival is modeled as a selection from n? equally likely possibilities, 
of which | Sy+| can be assigned, given that the state is T. Therefore, 
the conditional probability that an arrival can be assigned is 


p(f|T) = |S#|/n’. (38) 
Combining (35) through (38) gives (5). 


tThis distribution was also derived in Ref. 13. 
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APPENDIX C 
Proof of Theorem 1 


To prove Theorem 1, we sum the flow equations in (7) over the set 
of states with k units of traffic, i.e., 


1z,, WRP(E) + Ap(fL T)p(T)) 


= Y 2 p(T’)r(T’, T) + ee wage PIP I(T", T). (39) 


Tes, T’ES} 
The left-hand term is 
= [ukp(T) + Ap(f|T)p(T)] = uk & p(T) +2 _ = p(fl|T) 
TES, TES, TES, 


-p(T) 
= pwkpo(k) + dpolf| k)polR). (40) 
We evaluate the terms on the right side of (39) by interchanging the 
order of summation, which is permissible since the sums are always 
finite. Because S,41 = Ures, St, and for T’ € Sy41, r(T’, T) # 0 only 
if T’ € S}, interchanging the sums in the first term on the right yields 
> 2 p(T’)r(T’,T)= 2 2 p(T’)r(T’, T). 
Tes, T’ESt T’ESz41 TES, 
But if T’ € Sz4, and T € S;, then r(T’, T) # 0 only if T € S~@. 
Consequently, 
> 2 PT’)r(T’,T)= & > PT’) (T", T) 


T’ES,,, TES, T’ES,4, TESF 


= 2 PT’)ulk + 1) 


T’ESpa1 
= wk + 1)po(k + 1), (41) 


where the next to the last step follows from Lemma 1. A similar 
argument shows that the second term on the right is 


2 2 p(T’)r(T’, T) > 2 p(T’)r(T’, T) 


Tes, T’ESt T’ES,_, TES, 


P(T’)r(T’, T) 


T’ES,_, a 


ee P(T’)dAp(f| T’) 


Apol( f|k — 1)pol(k — 1). (42) 
Combining (40) through (42) yields (9). 
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APPENDIX D 
Proof of Lemma 2 


Lemma 2 is stated in the literature and although some unpublished 
proofs are referenced in Ref. 14, there does not seem to be a published 
proof. 

We prove this theorem by calculating the partial derivatives and 
showing that they are nonnegative in the region of interest. Differen- 
tiating (24) gives 


me—-1 ] k 
2 — p* Il ¢; 
a | eR? io) _ dant) - abo) gy 
0¢1 = 1 ok k-1 [bi(l) + bo(l)di)? ’ 
pe ue 
zo hi ? nr 
where 
= 1 = 1 
a,(l) = z Be *a(k) b,(l) = 2 Ee *o(k — 1) 
me-1 J] me 
a(l)= 2 p*B(R) bl) = 2 + p*B(k — 1) 
p= R! rin R! 
and 
k k 
a(k) = II 4; B(R) = II 9; 
j=0 jJ=0 
jl 
a(-1) = 6(-1) = 1, 
The derivative (43) is nonnegative if and only if 
bi (L)ag(l) — a,(1)b2(1) = 0. (44) 


The expression (44) is a polynomial in p with powers ranging from | 
to mc + 1 — 1. We will show that the coefficient of each power is 
nonnegative. Let mc + 1 — 12s 2l and define the sets 


Q, = {(ki, ke) | ki tke =s,0Sk <11<ke<me-— 1} 
OQ. = {(k3, ky) | ka + kg = 8,0 Sk3 <1 -1,14+1 < ky S me}. 


Now the coefficient of p* in (44) may be written as 


1 
. he, a ; «(ki — 1)6(R2) — ae halal 


Define the set 


a(ks)B(Rq — 1). (45) 


Oro = {(Ri, Re) | (Ri, Re) € M1, ky = 1} 
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and note that (ks, k4) € Q2 if and only if ks = ki — 1, ky = ko + 1, where 
(ky, ke) € M2 C Q;. This implies that (45) is equal to w: + we, where 


1 
w, = poe halhol a(k, — 1)B(Re) >0 (46a) 


a ee 1 
2 | Rylkol (Ry — 1) + DU! 


A term in the sum a» is negative if and only if (kp + 1)/k, < 1. But 
this inequality and the definition of 2:2 imply that] +1<k, +1< 
k,, which is impossible because k; <1 for (ki, Ro) € Qi. Consequently, 
we have wo = 0. This w; = 0 implies that (44) and (45) are nonnegative; 
therefore, each partial derivative (43) is nonnegative. Q.E.D. 


| a(k; — 1)B(R2). — (46b) 
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Many approximations for queueing characteristics such as the mean equi- 
librium queue length are based on two moments of the interarrival and service 
times. To evaluate these approximations, we suggest looking at the set of all 
possible values of the queueing characteristics given the specified moment 
parameters. This set-valued function is useful for evaluating the accuracy of 
approximations. For several models, such as the GI/M/1 queue, the set of 
possible values for the mean queue length given limited-moment information 
can be conveniently described by simple extremal distributions. Here we 
calculate the set of possible values for the mean queue length in a GI/M/1 
queue and show how it depends on the traffic intensity and the second moment. 
We also use extremal distributions to compare alternative parameters for 
approximations. The results provide useful insights about approximations for 
non-Markov networks of queues and other complex queueing systems. The 
general procedure is widely applicable to investigate the accuracy of approxi- 
mations. 


I. INTRODUCTION AND SUMMARY 


Queueing models are important tools for studying the performance 
of complex systems, but despite the substantial queueing theory lit- 
erature, it is often necessary to use approximations. The purpose of 
this series of papers is to help develop a theory for evaluating queueing 
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approximations. Devising appropriate queueing approximations no 
doubt will continue to be largely an art, but we believe that there is a 
need and a real possibility for more supporting theory. 

In this series of papers we examine the accuracy of queueing ap- 
proximations that are based on a few parameters partially character- 
izing the arrival process and the service-time distribution. We use an 
approach originally introduced by Holtzman’ and Eckberg? at Bell 
Laboratories and Rolski*® in Poland. Since the approximations apply 
to all arrival processes and all service-time distributions with the same 
parameters, we propose evaluating the approximations by examining 
the set of all possible values of the congestion measure consistent with 
the specified parameters. To be specific, consider the GI/G/1 queue, 
which has a single server, unlimited waiting room, the first-come first- 
served discipline, and a renewal arrival process independent of iid 
(independent and identically distributed) service times. Many approx- 
imations for the equilibrium mean queue length in the GI/G/1 queue 
are based on the first two moments of the interarrival-time and service- 
time distributions; see Shanthikumar and Buzacott® and Whitt.” In 
this context we suggest considering the set-valued function that maps 
the four moment parameters into the set of possible values of the 
mean queue length. 

It should be clear that we are in an excellent position to develop 
and evaluate approximations if we can identify such set-valued func- 
tions. We can see if a candidate approximation is an element of this 
set for all parameters of interest; then there always is a system for 
which the approximation is exact. We can also see if an approximation 
is in the middle of this set; then large errors are avoided and the 
approximation usually corresponds to a typical system value. 

There is also much to be learned without considering any specific 
approximation. The range of values indicates the possible accuracy of 
any approximation. We can investigate how this range depends on the 
parameters to determine how the possible accuracy depends on the 
parameters. We can see how the range is reduced by incorporating 
additional information, e.g., another moment. We can also compare 
different parameter specifications by comparing the different set- 
valued functions. 

This approach has wide applicability in queueing and elsewhere, 
provided that we can indeed identify the desired set-valued functions. 
As one would expect, this task is usually difficult, but there is an 
emerging methodology for attacking this problem. It is sometimes 
possible to identify relatively simple extremal distributions that yield 
the maximum and minimum values of the congestion measure given 
the parameters. A major tool for this purpose is the theory of complete 
Tchebycheff systems in Karlin and Studden.® The idea of applying 
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complete Tchebycheff systems and extremal distributions to conges- 
tion models is due to Holtzman’ and Rolski.* Eckberg’ first used this 
approach to compare alternate parameter specifications, primarily the 
peakedness versus the variance as a second parameter in addition to 
the mean in GI/M/s loss systems. Other relevant references are 
Bergmann et al.,? Daley and Rolski,!® Karr,!! Stoyan,’” and Whitt.'** 

The principal focus in the papers here is the GI/M/1 queue, which 
has an exponential service-time distribution. (We also have results for 
more general GI/G/1 queues; see Section V of this paper and Sections 
Vi and VII of Part II, a subsequent paper in this issue of the Journal.) 
In Part I, we describe the set of all possible values of the mean queue 
length in the GI/M/1 model given the service rate and various param- 
eters partially characterizing the interarrival-time distribution, espe- 
cially the first two moments. We obtain useful descriptions of the way 
this set depends on the parameters (see Section II). For example, the 
maximum relative error [defined in (4)] in the mean queue length 
given the first two moments of the interarrival time turns out to be 
precisely the squared coefficient of variation (variance divided by the 
square of the mean) of the interarrival time; see Corollary 1. We also 
evaluate alternate parameter specifications (see Sections III and IV). 

We must emphasize that we are not actually interested in the 
GI/M/1 model itself. Given a GI/M/1 model, it is obviously not 
difficult to calculate the mean queue length exactly. We are actually 
interested in more general models in which exact solutions are not 
possible. Where GI/M/1 models arise, they arise as approximations, 
e.g., the arrival process is approximated by a renewal process partially 
characterized by the first two moments of the renewal interval.’*"° 
Then there is no corresponding renewal-interval distribution for exact 
analysis. 

We became motivated to conduct this study while developing the 
software package QNA (Queueing Network Analyzer),!”'® which cal- 
culates approximate congestion measures for non-Markovian net- 
works of queues, i.e., with non-Poisson arrival processes and nonex- 
ponential service-time distributions. The procedure in QNA is, first, 
to approximate each arrival process by a renewal process partially 
characterized by the first two moments of the renewal interval and, 
second, for each node to apply approximation formulas for the conges- 
tion measures in a GI/G/m queue partially characterized by the first 
two moments of the interarrival-time and service-time distributions. 
It is natural to study these two steps separately. The first step is 
studied in Whitt!° and Albin.’® The second step is studied here. 

For the network of queues and other applications, we would actually 
like to treat the more general GI/G/m model, but we are not yet able 
to do this. Nevertheless, we believe that the GI/M/1 results here are 
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important. They indicate what happens more generally. While the 
exponential distribution is exceptional in its analytic simplicity, it is 
rather typical in its degree of variability (in between deterministic and 
highly variable). Moreover, the sharp analytic results available for the 
GI/M/1 model will be useful theoretical reference points for other 
cases that require relatively complicated numerical methods or simu- 
lation. Even if an extremal distribution is identified for other 
GI/G/m queues, it may be a nontrivial task to calculate the mean 
queue length. 

We emphasize that the relevance of the extremal distributions for 
the GI/M/1 model was established before.'> Here we apply this theory 
to examine in detail the implications for queueing approximations. 
We determine which parameters are best, how the quality of approx- 
imations depends on the parameters, and how much additional infor- 
mation helps. 

As an important part of our results, we display the extremal distri- 
butions yielding the extreme values of the mean queue length. These 
extremal distributions are of interest beyond the GI/M/1 queue con- 
sidered here because they are also extremal in many other settings. 
(This will be evident from Sections II and V.) Moreover, in settings 
such as the GI/G/m queue in which the actual extremal distributions 
are still unknown, the GI/M/1 extremal distributions can be used in 
numerical methods and simulations to get an approximate range of 
possible congestion values. 

To describe the situation for the GI/M/1 queue, let u be an inter- 
arrival time, v a service time, p the traffic intensity (0 = Ev/Eu), c” 
the squared coefficient of variation of an interarrival time, and L the 
expected equilibrium queue length (number in system) at an arbitrary 
time. For the GI/M/1 queue,’® 


where o is the unique root in the open interval (0, 1) of the equation 
é[u(1 — o)] =o, (2) 


with » = 1/Ev and ¢(s) the Laplace-Stieltjes transform of the inter- 
arrival-time cdf, say F, 


o(s) = f e “dF (t). (3) 


The root o in (2) is also of interest itself because it is the probability 
that a customer will have to wait before beginning service. It is clear 
from (1) and (2) that o and L depend on the entire cdf F, not just its 
first two moments. 

So, what about the range of possible values for o and L in the 
GI/M/1 queue? Unfortunately, the range can be very wide. For ex- 
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ample, let Eu = 2, Eu? = 12 (so that Var(u) = 8 and c? = 2), and Ev 
= 4/3 (so that p = 2/3). The possible values of o range from 0.417 to 
0.806 and the possible values of L range from 1.14 to 3.44, giving a 
maximum relative error of 200 percent (Table IV). 

This wide range naturally causes us to question the value of the 
various two-moment approximations. However, the particular distri- 
butions yielding the extreme values of L suggest an explanation. These 
extremal distributions are two-point distributions, so they are ob- 
viously very unusual. We would hope that for typical (nice) distribu- 
tions o and L would not vary much among interarrival-time distribu- 
tions with the same moments. In Parts II and III,?*”" we investigate 
how much the range is reduced by imposing various shape constraints 
on the interarrival-time distribution. Part II by Klincewicz and Whitt” 
presents a new approach. Since the theory of complete Tchebycheff 
systems no longer applies with shape constraints, Part II uses nonlin- 
ear programming to identify the extreme values of L and the associated 
extremal interarrival-time distributions given various shape con- 
straints. We believe that Part II is the first investigation of extremal 
distributions in the presence of shape constraints. 

The numerical results in Part II are strikingly similar to the theo- 
retical results in Part I, suggesting that a theory corresponding to Part 
I can be developed for many kinds of shape constraints. Part III shows 
how this can be done in one important special case. Part III shows 
that the theory of complete Tchebycheff systems can be applied again 
for one important kind of shape constraint: assuming that the distri- 
bution is a mixture of exponential distributions. 

Overall, this study indicates that two-moment approximations can 
perform poorly, but if the distribution is not too irregular then they 
should perform reasonably well. At any rate, numbers are provided so 
that we can reach our own conclusions, which may depend on the 
circumstances. 

Here is how the rest of this paper is organized. In Section II we 
study the extremal distributions with the first two moments fixed. In 
Section III we do a similar analysis with the mean and the peakedness 
(the transform evaluated at the service rate) fixed. In Section IV we 
investigate other parameter specifications, including the first three 
moments. Finally, in Section V we briefly discuss extremal distribu- 
tions in other models such as the GI/G/1 queue and the GI/M/1 loss 
system. It is significant that the theory of extremal distributions is 
not limited to the GI/M/1 model. 


Il. EXTREMAL DISTRIBUTIONS GIVEN THE FIRST TWO MOMENTS 


Consider the set of all probability distributions on the interval [0, 
bm], b < «, having first two moments m; and mz (and no mass at 
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infinity). This is a convex set depending on the three parameters b, 
m, and c”, where c? is the squared coefficient of variation: c? = (mz: — 
m‘)/m?. The set is nonempty provided that b = 1 + c”. Two distribu- 
tions in this set are of particular interest; we call them the upper and 
lower bounds because they yield the maximum and minimum mean 
queue lengths, respectively, among interarrival-time distribution in 
this set. The upper bound is the two-point distribution with mass 
c?/(1 +c?) on 0 and mass 1/(1 + c?) on m,(1 + c”), having cdf denote 
by F,, and the lower bound is the two-point distribution with mass 
c?/[c? + (b — 1)?] on bm, and mass (b — 1)?/[c? + (b — 1)?] on m[1 — 
c?/(b — 1)], having cdf denoted by Fz As b — ©, the lower bound 
approaches (converges in law) to the limiting lower bound, which is 
the one-point distribution with mass 1 on m, having cdf denoted by 
Ff. 

Note that the limiting lower bound is not actually in the reference 
set because it has zero variance. These distributions are especially 
useful because they are minimal and maximal elements for a partial 
ordering of the distributions based on the Laplace-Stieltjes transforms. 
Definition 1: F, <, F2 for two cdf’s on [0, ©) if ¢:(s) < ¢2(s) for all 
s = 0, where ¢; is the Laplace-Stieltjes transform of F; defined in (3). 

Since the transform ¢(s) is the expectation of a decreasing function, 
the smaller cdf in the ordering <, tends to have what we would 
normally think of as the stochastically larger distribution; in fact, in 
Section 1.8 of Stoyan,’” F; <, F2 is said to hold if ¢;(s) = ¢2(s) for all 
s = 0. However, smaller interarrival times mean more arrivals and 
more congestion. We use this definition because the upper-(lower-) 
bound distribution yields the maximum (minimum) mean queue 
length. | 

Let F = F(m, c?, 6) be the set of all cdf’s with parameters m, c”, 
and b. Let F, and Fvbe the cdf’s in F associated with the special 
extremal distributions, and let F/ be the associated limiting lower- 
bound cdf. The following proposition is just a restatement of 2.1.1 of 
Eckberg,” which in turn is an elementary consequence of the theory 
of complete Tchebycheff systems.® 
Proposition 1: ForallFe F,F/ <, Fv<, F<, F_. 

It is a simple matter to check the following property. 

Proposition 2: Fz decreases in <, as b increases and $/(s) > ¢/(s) for 
alls as b > ~, 

As noted by Holtzman,’ Rolski,*° and Eckberg,” the ordering <;, 
and the extremal distributions have immediate application to queues. 
Consider the GI/M/1 queue with fixed service rate » and interarrival- 
time distributions in F. Without loss of generality, assume m, = 1. 
Now it is natural to work with the three parameters p, c’, and b. Let 
L and o in (1) and (2) be indexed to indicate the extremal interarrival- 
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time distributions. As an immediate consequence of Proposition 1 and 
(2), we have 
Proposition 3: For all F € F(p, c?, b), cf S 07 So <6, andL/ < Ly 
<L<bL,. 
Remark 1: More generally, if F, <, F2 for two interarrival-time cdf’s, 
then o; < o2 in the associated GI/M/1 queue with common service 
rate. This in turn implies not only that L, < Lz but also that the 
associated steady-state queue-length distributions are stochastically 
ordered; see Theorem 5.2.3b of Stoyan.!” 

For approximations, it is interesting to know about the maximum 


relative error (MRE) in L, defined by 
MRE = MRE(p, c’, b) = (L, — La)/Le. (4) 


From (1), we see that MRE = (a, — a/)/(1 — a,). 

Now we show how the extremal queue characteristics (c4 Lz etc.) 
and MRE depend on the parameters p, c”, and b. We first describe 
how o/ depends on p, the only relevant parameter for the limiting 
lower bound. 


Theorem 1: For0 < p< 1, o/ < p and 


da? _ p (1 — af )e 1-94 \/p 


dp = p te U-oz \/p > 0. 
Proof: Consider eq. (2) for F/. The function 
fx) =2— et (5) 


is positive for 0 <x < o/ and negative for ¢/ < x < 1, so to show that 
of <p it suffices to show that f(p) = p — e~"’”” is negative for 0 < 
p <1. Make the change of variables y = (1 — p)/p to obtain f(y) = 
(1 + y)7! — e”, which is clearly positive for all y > 0. To verify the 
inequality for the derivative, differentiate f(x) in (5). Use o/ < p to 
show that the denominator is always positive: 


pies & \/p < pe -e)/0 <1. 


We now show that all results for of immediately imply results 
for a,,. 
Theorem 2: o, =1—(1—a/)/(1 +c”). 
Proof: For the upper bound, eq. (2) is 


¢? 1 —(1=o,)(1+07)/p _ 


—— + 

e+1 c#4+1 
Multiply both sides by c” + 1, subtract c? from both sides, and then 
make the change of variables 1 — of = (1 — a,)(1 + c”) to obtain eq. 
(2) for the limiting lower bound. 


ae 
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Corollary 1: L, = L/ (1 + c*) and MRE(p, c?, ©) = c?. 
Remark: Theorems 1 and 2 together with (1) imply that a, and L, are 
increasing in p. 

We now turn to the lower bound when there is a bound on the 
distribution (b < ©), Straightforward but tedious calculations (differ- 
entiation) verify the expected monotonicity properties: 

Theorem 3: (a) The lower-bound characteristics of and Ly are increas- 
ing in p and c? and decreasing in b. (b) MRE(p, c?, b) is increasing in 


Combining Theorems 2 and 3b, we obtain 
Corollary 2: MRE(p, c?, b) < c?. 

Numerical evaluation of MRE(p, c”, b) for 14 values of p, 4 values 
of c’, and 5 values of b support the following conjecture. 

Conjecture 1: MRE(p, c?, b) is decreasing in p. 

In Table I we display MRE(p, c”, b) for three values of p, four values 
of c”, and four values of b. These specific cases show that MRE(p, c?, 
b) is strongly affected by each of the parameters p, c”, and b. The 
bound 6 can make a big difference, especially for larger p and c: see 
the case p = 0.9 and c? = 4. These specific cases demonstrate that 
MRE(p, c’, b) is not monotone in c?. In fact, when c? increases with b 
fixed, the lower-bound distribution Fveventually coincides with the 
upper-bound distribution F.,, becoming the two-point distribution with 
mass b™' on b and mass 1 — b on 0 (m; = 1). Of course, as c? > 0, Fv 
and F, both approach F/, so that MRE(p, c?, b) > 0 too as c? > 0. 
The numerical results also support the following conjecture: 
Conjecture 2: MRE(p, c?, b) is unimodal in c?. 

We now investigate how the extremal queue characteristics and 


Table I—Values of MRE(p, c?, b) for the GI/M/1 queue* 


Bound on Interarrival-Time Distribution in 





Traffic Squared Coef- Multiples of the Mean 
Intensity, ficient of Vari- 

p ation, c? b=5 b=10 b = 20 b= 40 
0.5 0.5 0.373 0.442 0.472 0.487 
1.0 0.604 0.833 0.924 0.964 

2.0 0.527 1.40 1.75 1.89 

4.0 0.000 1.28 3.01 3.59 
0.7 0.5 0.231 0.350 0.424 0.462 
1.0 0.290 0.583 0.791 0.897 

2.0 0.185 0.699 1.33 1.68 

4.0 0.000 0.349 1.56 2.86 
0.9 0.5 0.070 0.143 0.248 0.353 
1.0 0.072 0.174 0.365 0.610 
2.0 0.043 0.143 0.374 0.858 
4.0 0.000 0.071 0.232 0.712 


* The maximum relative error in the steady-state mean queue length L given the traffic 
intensity p, the interarrival-time squared coefficient of variation c?, and the bound on 
the interarrival-time distribution 6 (in multiples of the mean); see Section IV. 
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MRE(p, c”, b) behave in light and heavy traffic, i.e., as p — 0 and p > 
1. As an easy consequence of (2), we obtain 


Theorem 4: As p — 0, 
a, > c7/(1 + c?), oz — 0, and MRE(p, c?, b) > c?. 


We describe the behavior as p — 1 for b < © in more detail. The 
following result provides an interesting refinement to the classical 
heavy-traffic limit theorem,’ from which we can deduce that (1 — 
p)L — (1 + c?)/2 as w approaches \ from above for any fixed renewal 
arrival process. 


Theorem 5: For all b, 


_21—p) 41 - p)? 


-—o,= + 0(1 — p)® 6 
poe 1+c? 3(1 +c’) ae 6) 


and, for b< », 
21—p) 4(1—p)? ms 


where 
-s, c7b° (b — 1 —- cc’)? (3) 
ne + (b= 1 b= le + = 1)’ 
so that, for b< ©, 
. MRE(p, c*, b) _4/__ms 
oe en (a +0)? i} ” 


Proof: Let x = (1 — o/)/p. To find the derivative of x with respect to 
p, differentiate with respect to p in eq. (2), i.e., 
2 
x 


x x? ; 
1 — px =e See = ee Oe) 


or 


ae cae 
p eG + O(x”). 

After successive differentiation with L’Hospital’s rule, this yields 
x’(1) = —2 and x”(1) = — 8/3. From Taylor’s theorem and Theorem 
1, we obtain (6). The calculation for the lower bound in (7) is similar. 
Remarks: It is possible to check the consistency of (6) and (7) because 
they must agree as b > 1 + c”. It is not possible to do a consistency 
check as b — ~ because the two iterated limits involving b — © and 
p — 1 are not equal. 

We conclude this section by displaying in Table II the extremal 
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Table !I—The extremal Gl/M/1 characteristics for fixed traffic 
intensity, p, squared coefficient of variation, c*, and bound on the 
distribution b: Case of c? = 2.0 


Bound on Interarrival-Time Distribution in Multiples of 


Traffic Upper-Bound the Mean 
Intensity, Characteris- 

p tics b=5 b= 10 b = 20 b= 40 

0.2 o,, = 0.669 o7= 0.092 a7= 0.022 of= 0.012 o7= 0.009 
L, = 0.604 Lz= 0.220 Lz= 0.204 Lz= 0.202 Lz= 0.202 

0.5 o,, = 0.734 o7= 0.594 o7= 0.361 o7= 0.269 of = 0.233 
L, = 1.88 Lz= 1.23 Lz= 0.783 L= 0.684 Lz= 0.652 

0.7 o,, = 0.822 o7= 0.790 o7= 0.698 o7= 0.585 a7 = 0.524 
L, = 3.94 Lz= 3.32 Lz= 2.32 Lz= 1.69 Lz= 1.47 

0.9 o,, = 0.936 o7= 0.933 o7= 0.926 o7= 0.912 a7= 0.880 
L, = 13.98 Lz= 13.41 Lz= 12.23 Lz= 10.17 Lz= 7.52 


characteristics 04 Lz and o,, and L, for the cases in Table I with c? = 
2. The associated maximum relative errors for p = 0.5, 0.7, and 0.9 are 
given in Table I. These will be compared with other parameter speci- 
fications in the following sections. 


Ill. THE SECOND PARAMETER: VARIANCE VERSUS PEAKEDNESS 


The first two moments are natural parameters if two parameters 
are to be used to partially characterize an interarrival-time or a service- 
time distribution, but it is not clear that these are the best two 
parameters. Of course, the chosen parameters should be easy to 
estimate and easy to use in approximations for queues. Also, the 
parameters should have power determining descriptive queue charac- 
teristics; i.e., there should be a small MRE or a small range of possible 
values of L. In this regard, Eckberg” has shown that the peakedness 
of a renewal arrival process is a much better second parameter in 
addition to the mean than the variance for GI/M/k loss systems and 
also, to some extent, for GI/M/k delay systems. The peakedness is the 
ratio of the variance to the mean of the steady-state number of busy 
servers in an associated GI/M/o system; see Holtzman,! Eckberg,” 
and references there. Knowing the peakedness of a renewal process, 
say z, is equivalent to knowing ¢(u), the transform evaluated at the 
service rate pu: 


b(n) =1— (2+ Ju). (10) 


The peakedness is an important parameter to consider because it is 
often available as an approximate characterization of overflow proc- 
esses via the equivalent random method.” Since Eckberg’s results” 
suggest that the mean and the parameter ¢(u) might be much better 
than the mean and variance, we investigate this new parameter pair 
here. 
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However, before examining this new parameter pair, we explain why 
the variance might be a better second parameter for single-server delay 
systems. Knowing the mean and variance (i.e., c*) is equivalent to 
knowing the first two derivatives of the transform ¢(s) at 0. It is 
intuitively reasonable that we might pin down the transform ¢(s) 
better by fixing the value at y, #(u) than by fixing the second derivative 
at 0, #”(0). However, this depends on the way the queue characteristics 
depend on the transform. For the GI/M/k loss system, the relevant 
parameters are ¢(ju) for] = 1, 2, --- , k, with the parameters tending 
to be of less importance as / increases. These parameters are values of 
the transform ¢(s) evaluated at points s such that s = uw. For approxi- 
mations, it is clearly better to specify @(u) and #’(0) than #”(0) and 
op’ (0). 

For the GI/M/1 delay system the key parameter in (2) is the 
transform value ¢[u(1 — o)]. Of course, we do not know o in advance, 
but the argument is always less than yp. Since o tends to be near p, the 
argument tends to be near u(1 — p). Clearly, for large p, knowing ¢”(0) 
should be better than knowing ¢(u). On the other hand, for small p, 
knowing ¢(u) should be better than knowing ¢”(0). 

Our results substantiate this intuitive reasoning. In marked contrast 
to GI/M/k loss systems, for GI/M/1 delay systems the parameter ¢() 
is not uniformly better than the variance as a second parameter. 
Which second parameter is better depends on the traffic intensity, 
with the variance improving as p increases. Consistent with the 
intuitive discussion above, we shall show that asymptotic behavior of 
the maximum relative error as p approaches 0 and 1 is strikingly 
different given ¢(u) instead of c”. Moreover, the variance does better 
for the upper bound, whereas ¢(u) does better for the lower bound. 

It is also appropriate to mention that we are considering the peaked- 
ness of the renewal arrival process as a single parameter, which by 
(10) can be represented as the transform ¢ evaluated at the service 
rate u. If, instead, we knew the peakedness as a function of the service 
rate as in Eckberg,” then we would know the entire transform, which 
is equivalent to knowing the entire interarrival-time distribution. 
Moreover, if we could choose one argument of the transform, then we 
obviously could do better by picking a value less than yu. For example, 
there would be no error for the GI/M/1 queue if we could guess o and 
make the argument u(1 — a). If we could choose one argument given 
only the arrival rate and service rate, then a natural choice would be 
u(1 — p). (This parameter is considered here in Section IV.) In 
applications, however, we typically have no choice. Then the arrival 
process (which may not be renewal) may be partially characterized 
(by the equivalent random method and related techniques) by rate 
and peakedness. Moreover, the given peakedness might be with respect 
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to a different service rate (or even a different service-time distribu- 
tion’). In the context of the GI/M/1 queue, this peakedness parameter 
will lead to better approximations if the argument of the transform, 
after using (10), is close to u(1 — o). The parameter ¢(u) considered 
here should give some idea about what will happen in general. 

The new parameter pair involving ¢(u) leads to new two-point 
extremal distributions and a new partial ordering of the distributions. 
Now consider the set of all probability distributions on the interval 
[0, bm], b < &, having first moment m, and transform ¢(u) at s = pu 
(and no mass at infinity). This is a convex set depending on the 
parameters b, m,, and ¢(u). The extremal distributions here are the 
upper bound, which is the two-point distribution with mass p = (b — 
1)/(b — x) on xm, and mass 1 — p on bm, where x satisfies 


pe-*/* + (1 — p)e~”? = $(p); (11) 


and lower bound, which is the two-point distribution with mass 1 — 
x! on 0 and mass x7‘ on xm, where 


x = (1 — p™*/")/(1 — (1/p)). (12) 


Unlike Section IJ, the upper bound here depends on b while the 
lower bound does not. As b ~ », the upper bound converges in law to 
a limiting upper bound, which is the one-point distribution with mass 
1 on —(log 6(u))/u. Note that the limiting upper bound is not actually 
in the reference set because the mean is not m,. These distributions 
are minimal and maximal elements for another partial ordering of the 
distributions based on the transform. 


Definition 2: F, <, F2 for two cdf’s on [0, ©) if 
gi(s) < dos), s<u, and ¢,(s)>¢(s), seu. 


Let G = G(m, nu, o(u), 5) be the set of all cdf’s with parameters m, 
u, (uw), and b. Without loss of generality, let m; = 1. Let G,, G4 and 
G; be the cdf’s associated with the special extremal distributions. From 
Section 2.2.3 and (5) of Eckberg,” we obtain 
Proposition 4: For allG € G, G7 s, G S, G, S, G;. 

It is easy to see the effect of changing b: 

Proposition 5: G, increases in <, as b increases and ¢,{s) — $(s) for 
each s as b—> ©, 

Here are the implications for the GI/M/1 queue. A tilde is used to 
indicate that the extremal distributions are from this section (because 
we want to relate them to those in Section II). 

Proposition 6: For all G € G, 


~ 
~ 


oe So 86,8505 and lysL<li,< 


ge, 
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Using the same change of variables argument as in Theorem 2, we 
can express o7 in terms of a/. 
Theorem 6: 67 =1-— (1 — o/)/x for x in (12). 
Remarks: As a consequence of Theorem 6, Lv = xL/ for x in (12). 
Since x7! is a probability, of < ¢ and L/ < Lz Moreover, ov and Lz 
are decreasing in ¢(u) for fixed » and p. Finally, we can combine 
Theorems 2 and 6 to obtain gv <a, and Ly < L,; use the fact that x7! 
<1l<l+c’, 

We now consider the upper-bound characteristic o,. Let o7(p) be 
the limiting lower bound in Section II as a function of p. 
Theorem 7: If ¢(u) = e, then the GI/M/1 queue based on G; is 
unstable and ¢; = 1 is the only root. If.¢(u) < e7’, then 


a; = of (—1/log $(u)). (13) 


Remarks: As a consequence of Theorems 1 and 7, if ¢(u) < e7’, then 
6; < — log ¢(u) and o; is increasing in $(u). 

Paralleling Theorem 7, we have (omitting the proof) 

Theorem 8: (a) The characteristics ¢,, and L, are decreasing in $(,) 
and increasing in b. (b) MRE(p, u, o(u), b) is increasing in b. 

We now consider limits as the traffic intensity p approaches 0 and 
1. Here we assume the transform is based on a fixed interarrival-time 
cdf and that p changes by changing u. 

Theorem 9: As p > 1 (u > 1), 6 — 1 and a7 — Gz(1) < 1, where 
o7(1) is the root o in (0, 1) of 


1 — 1/x + (1/x)e"9-"* = o (14) 
and 
x = (1 —e™)/(1 — (1). (15) 


Proof: For c,, use Theorem 5 and the fact that o7 < c,. For the lower 
bound, note that » — 1 and ¢(u) — (1) as p — 1, so that x in (12) 
approaches (15) and eq. (2) approaches (14). 
Corollary 3: As p— 1, MRE(p, p, o(u), 6) > &. 

We have not yet been able to treat all cases when p — 0. Several 
possibilities are covered by the next theorem. 
Theorem 10: If p > 0 (wu — &), then (a) «7 — 0; (b) a, — 0 when 
F(e) = 0 for some « > 0; (c) & — «(0) when F(0) > 0, where ¢,(0) is 
the root o in (0, 1) of 


((b — 1)/b)"F(0)'" = o. (16) 


Proof: (a) Use Theorems 6 and 4. Note that x — 1 as np — © for x in 
(14). (b) Note that ¢(n) < e“‘ so x = «/d for sufficiently large u. Hence, 
from (2), &, — 0. (c) Note that ¢(u) — F(0) and e”’ + 0 as p > ©, so 
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that x — 0 for x satisfying (11), x/p — —log[bF(0)/(b — 1)] and a, > 
o,(0) as claimed. 

Corollary 4: If p — 0, then MRE(p, pn, 6(u), 6) > 0, when F(e) = 0 for 
some «> 0 and MRE(p, np, o(u), 6) — a for some constant a > 0 when 
F(0) > 0. 

In Table III we display the extremal characteristics a7 and o,, and 
MRE(p, pu, 6(u), 6) for four values of p and four values of b. In each 
case, the given transform values ¢(u), which are also displayed in 
Table III, are calculated for the prototype distribution used in Part II 
with m, = 2 and c? = 2. Since the mean interarrival time is 2, » = 
1/2p. 

It is interesting to compare Table III with Table II and the c? = 2 
case of Table I. The main conclusion from Tables I and III is that the 
MRE is always smaller with c? than with ¢(u). For p = 0.9 it is smaller 
by a factor of ten. 

From Tables II and III, we see that a, < a, in all cases except p = 
0.2 and b = 5. Also a7 tends to be better (bigger) than a7 as p increases 
and b decreases, but neither characteristic is uniformly better. 

From Table III and additional cases, it is apparent that the MRE is 
quite insensitive to changes in p, varying very little from p = 0.2 to 
p = 0.9. Table III also shows that MRE(p, pu, $(u), 6) is not monotone 
in p. The data suggest the following conjecture. 

Conjecture 3: MRE(p, pu, (nu), b) is unimodal as a function of p with a 
maximum that increases with b (assuming ¢(u) is calculated for a 
fixed interarrival-time distribution). 

Finally, note that (u) > e~! = 0.3678 for each p in Table III, so the 
queue based on G; is unstable, o, > 1, and MRE(p, p, ¢(u), b) > @ as 


b — oo, 


Table III—The extremal GI/M/1 characteristics and maximum 
relative error MRE(p, u (u), b) for fixed mean and transform value 
¢(u) based on the prototype distribution in Part Il having mean 2 and 
c? = 2 (so that w = 1/2p) 


Bound on Interarrival-Time Distribution in Multiples of the 


Traffic Transform Mean 
Inten- Valueand 
sity, p Lower Bound b=5 b=10 b= 20 b = 40 

0.2 p(n) = 0.377 Gg, = 0.607 6, = 0.705 6G, = 0.783 6, = 0.844 


o7= 0.381 MRE =0.573 MRE=1.10 MRE=1.85 MRE =6.81 
0.5 (x) = 0.466 3, = 0.737 6, = 0.832 &. = 0.900 3. = 0.944 
o7= 0.563 MRE =0.664 MRE=1.61 MRE=3.38 MRE=6.81 
0.7 = h(n) = 0.518 3, = 0.834 &. = 0.898 g, = 0.942 o.. = 0.969 
o7= 0.648 MRE =0.648 MRE=1.68 MRE=3.74 MRE = 7.72 
09 (uy) =0562 §=0942 6 =096 &=0981 4G =0.990 
67=0.905 MRE =0.625 MRE=169 MRE=3.84 MRE=8.15 
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IV. ADDITIONAL PARAMETER SPECIFICATIONS 


We now consider several other parameters in addition to the first 
two moments [m, m2] and the mean and the transform value [rm, 
(u)]. We consider two different three-parameter specifications: the 
first three moments [m;, me, m3], and the first two moments and the 
transform value [m,, me, ¢(u)]. We also consider two-parameter spec- 
ifications involving the transform value ¢(u(1 — p)), combining it with 
the mean and ¢(u). Each parameter specification is considered with 
and without an upper bound on the distribution. 

In each case the extremal distributions can be obtained from the 
theory of complete Tchebycheff systems by solving systems of equa- 
tions. The general formulas for the extremal distributions are either 
displayed explicitly in Eckberg” or can easily be obtained from the 
theory there. 

To obtain the parameter values themselves, we use the two prototype 
distributions described in Section II of Part II.” Prototype I is more 
variable with c? = 2.0 and Prototype II is less variable with c? = 0.8. 
We also consider two values of the traffic intensity; p = 2/3 and p = 
9/10. Finally, we consider both an upper bound of 20 on the distribu- 
tion and no upper bound. Since the means for Prototypes I and II are 
2.0 and 4.0, respectively, the upper bounds are b = 10 and b = 5 times 
the mean, respectively. The value 20 was chosen for the bound to be 
consistent with the prototype distributions. All the prototype param- 
eter values are given in Tables IV and V. The extremal probability 
distributions themselves are displayed in Tables VI through IX. These 
are probability mass functions with all mass on one, two, or three 
points. The points are often the distribution boundary points 0 and 
20. In the case of two transform values {¢(u), ¢[u(1 — p)]} the 
distribution is defective (positive mass at infinity) in the lower bound 
for Prototype I and the upper bound for Prototype II. 

The following is a list of conclusions drawn from the numerical 
results in Tables IV through IX. These conclusions represent clear 
tendencies indicated by these (and other) data, but they are not 
theorems. For example, with respect to the results in Section II, the 
first conclusion is supported in part by Corollary 1, but is limited by 
the observation before Conjecture 2. 

1. For all parameter specifications, the MRE is much less with less 
variability; it is much less in Table V with c” = 0.8 than in Table IV 
with c? = 2.0. 

2. As noted in Section IJ, two moments and a bound on the distri- 
bution are sufficient for approximations with high traffic intensities 
(here MRE < 8 percent for p = 0.9), but not for all traffic intensities. 

3. An extra moment helps significantly. Three moments and a 
bound are good enough for approximations in all cases (MRE < 10 
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Table IV—Extremal characteristics and maximum relative errors for the GI/M/1 queue with various parameter 


Given Parameter 
Values of 


m, $(u) 0.698 


my, o(u(1 — p)) 0.754 
$(u), (u(1 — p)) 0.730 


mM, Me 0.417 
mM, M2, M3 0.754 
mm, Ma, o(u) 0.698 


b=a 


On 


1.000 
1.000 
0.793 
0.806 
0.806 
0.787 


MRE 


1.67 

0.242 
0.222 
0.830 
0.098 


p = 0.667 
b= 10 
(times the mean) 

MRE of C, 
od 0.698 0.887 
oe 0.754 0.802 
0.304 0.747 0.793 
2.00 0.645 0.806 
0.268 0.754 0.776 
0.418 0.747 0.787 


0.187 


of 


0.900 
0.931 
0.908 
0.807 
0.932 
0.900 


specifications: Case of Prototype | (mean = 2, c? = 2) 
p = 0.900 
b= 10 


b=a 


Ou 


1.000 
1.000 
0.945 
0.936 
0.936 
0.934 


(times the mean) 


MRE of 
oo 0.900 
cs) 0.931 

0.673 0.918 

2.00 0.926 

0.063 0.932 

0.515 0.931 


Ow 


0.965 
0.935 
0.945 
0.936 
0.933 
0.934 


MRE 


1.86 

0.062 
0.491 
0.063 
0.015 
0.046 


Characteristics of Prototype I: m; = 2.00, mz = 12.00, c? = 2.00, m3 = 119.01; the upper bound on the distribution b is in multiples of the mean. 


p = 0.667: » = 0.750, o(u) = 0.5098, o(u(1 — p)) = 0.7073, « = 0.7676; p = 0.900: u = 0.555, o(u) = 0.5615, o(u(1 — p)) = 0.9046, o = 0.9324. 
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Table V—Extremal characteristics and maximum relative errors for the GI/M/1 queue with various parameter 
specifications: Case of Prototype II (mean = 4, c* = 0.8) 


Given Parameter 
Values of 


m, $(u) 0.602 


mi, $(u(1 — p)) 0.638 
(u), o(u(1 — p)) 0.640 


my, M2 0.417 
m1, M2, M3 0.637 
mM, M2, $(u) 0.602 





b= 


Gu 


1.000 
1.000 
0.650 
0.676 
0.676 
0.651 


b=5 


(times the mean) 


p = 0.667 

MRE of 

i) 0.602 

er) 0.638 
0.031 0.640 
0.799 0.571 
0.120 0.637 
0.140 0.631 


On 


0.733 
0.645 
0.648 
0.676 
0.650 
0.651 


MRE 


0.491 
0.020 
0.023 
0.324 
0.037 
0.057 


Of 


0.868 
0.889 
0.888 
0.807 
0.890 
0.868 


b= 


Ou 


1.000 
1.000 
0.898 
0.893 
0.893 
0.891 


b=5 


(times the mean) 


p = 0.900 

MRE of 

00 0.868 

00 0.889 
0.098 0.888 
0.804 0.885 
0.028 0.890 
0.211 0.888 


Gu 


0.920 
0.890 
0.893 
0.893 
0.890 
0.891 


Characteristics of Prototype II: m; = 4.00, mz = 28.80, c? = 0.80, m3 = 279.83; p = 0.667; u = 0.375, (un) = 0.3881, ¢(u(1 — p)) = 0.6589, o = 0.6429; 


p = 0.900: » = 0.278, o(u) = 0.4613, 6(u(1 — p)) = 0.8991, « = 0.8901. 


Table VI—Extremal interarrival-time distributions for the GI/M/1 
queue with various parameter specifications: Case of Prototype | 
(m, = 2, c? = 2) with p = 2/3 








Extremal 
Given Parameter Character- 
Values istics Extremal Probability Mass Function, Mass p,; on x, 
Upper Bounds Ou Pi x1 Pe Xo Ps X3 
[™m, o(u)] 1.000 1.000 0.90 — — — — 
[mi, 6(u), 5] 0.887 0.9351 0.81 0.0619 20.00 — —_ 


[m1, m2], [m, mae, 0.806 0.6667 0.00 0.3333 6.00 — — 
ms], and [™m, mo, 


[m, 6(u(1 — p)), 6] 0.802 0.9583 1.22 0.0417 20.00 — — 

[d(u), o(u(1 — p))} 0.793 0.4565 0.00 0.5435 309 — bs 
and en o(u(1 
— p)), 

[™m1, me, @(u)] and 0.787 0.8060 0.61 0.1940 7.77 —_— —_ 
[™1, Me, d(u), 5] 


{m1, M2, M3, b] 0.776 0.5760 0.00 0.4132 4.32 0.0107 20.00 
Lower Bounds of Pi xy De Xe Ps X3 

[m,, m2, m3] and 0.754 0.906 1.09 0.094 10.79 — — 
[7m,, Me, M3, 


[m, @(u(1 — p))] 0.754 0.5787 0.00 0.4213 4,75 — — 
and [7m, o(u(1 — 


p)), 
Lote), o(u(1 — p)), 0.747 0.8311 0.59 0.1689 20.00 — — 


[™1, Me, o(u), b] 0.747 0.4628 0.00 0.5208 3.20 0.0167 20.00 

[o(u), o(u(1 — p))] 0.730 0.8330 0.65 = — — — 

Im, ()], [7m, 0.698 0.4810 0.00 0.5190 385 — ext 
o(u), b] and [m, 


ma, $(u)] 
[rm1, Me, b] 0.645 0.9759 1.56 0.0241 20.00 se = 
[m, Me] 0.417 1.000 2.00 = = _ = 


percent). The third moment reduces the MRE by approximately a 
factor of 10. 

4, The upper bound on the distribution can make a big difference. 
It matters when the extremal distribution has mass on the upper 
bound, which occurs for either the upper or lower extremal distribution 
but not for both. 

5. As noted in Section III, overall the second moment is better than 
the transform value ¢(u) as a second parameter in addition to the 
mean. However, for lower traffic intensities and no bound on the 
distribution, ¢(u) is better for the lower bound. The second moment 
is always better for the upper bound. Similarly, the third moment is 
always better than the transform value ¢(u) as a third parameter in 
addition to the first two moments. However, for lower traffic intensi- 
ties and no bound on the distribution, ¢(u) is better for the upper 
bound. 

6. The transform value ¢(u(1 — p)) is always better than the 
transform value ¢(u) since ¢(u(1 — p)) is closer to w(1 — o). Even 
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Table VII—Extremal interarrival-time distributions for the GI/M/1 
queue with various parameter specifications: Case of Prototype | 
(m, = 2, c? = 2) with p = 0.9 








Extremal 
Given Parameter Character- 
Values istics Extremal Probability Mass Function, Mass p, on x, 
Upper Bounds Cn Di x1 De Xe Ds X3 
[mi, o(u)] 1.000 1.000 1.04 — 


eee 0965 0.9442 094 0.0558 20.00 — = 
[o(n), d(e(1 — p))] 0945 05024 0.00 04976 383 — a 
and [¢(u), o(u(1 


— p)), b] 
[m, ms], [m, mz, 0.936 0.6667 0.00 0.3333 600 — = 


ms] and [m, Mma, 


[mi, d(u(1 — p)), 6] =: 0.935 0.9716 1.47 0.0284 20.00 —_— — 
[7m, me, d(u)] and (0.934 0.8060 0.61 0.1940 7.76 — — 


[m,, M2, o(u), b] 
(m1, Me, Mz, 5] 0.933 0.5760 0.00 0.4132 4.32 0.0107 20.00 


Lower Bounds of Pi xy P2 Xe P3 X3 


[m,, m2, m3] and 0.932 0.906 1.09 0.094 10.79 — — 
[m1, me, ms, b] 

[m, ¢(u(1 — p))] 0.931 0.64388 0.00 0.3562 5.62 = a 
and [mi, o(u(1 as 


P)); 
[™m1, Me, o(z), 5] 0.931 0.484 0.00 0.500 3.37 0.016 20.00 
mM, Ma, 0.926 0.9759 156 0.0241 20.00 — — 
lotu), P(u(1 — p)), 0.918 0.9248 0.90 0.0752 20.00 — 


[A(u), o(u(1 — p))] 0.908 0.9538 0.95 — — —_ 
[m1, (u)], [mi, 0.900 0.4812 0.00 0.5188 3.855 = — — 
$(u), b] and [m1, 
Ma, 6(u)] 
[7m,, me] 0.807 1.000 2.00 = = = = 


better is ¢(u(1 — co), where c is the Kraemer and Langenbach-Belz”™ 
approximation for the root o. The parameters ¢(u(1 — p) and ¢(u(1 — 
o)) do not appear very useful, however, because if it is possible to 
calculate them, it should also be easy to calculate the root o itself. On 
the other hand, an approximation for ¢(u) might be available from 
the peakedness without knowing the distribution or even without 
actually having a renewal process. Given the peakedness z, we obtain 
¢(u) for a renewal process from (10). 

7. For each parameter specification, one bound (either the upper or 
the lower) is “soft” and the other is “hard”; the soft bound can be 
greatly improved by adding an additional parameter, while the hard 
bound cannot. The hard bound also tends to be much better than the 
soft bound. For example, consider the parameter pair [™m, m2]. The 
lower bound is soft because it can be improved substantially by 
specifying b or m3. On the other hand, the upper bound is hard because 
no improvement is obtained by specifying b or m3. Moreover, the hard 
upper bound is clearly much better than the soft lower bound (as 
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Table Vill—Extremal interarrival-time distributions for the GI/M/1 
queue with various parameter specifications: Case of Prototype II 
(m, = 4, c? = 0.8) with p = 2/3 





Extremal 
Given Parameter Character- 
Values istics Extremal Probability Mass Functions, Mass p; and x, 
Upper Bounds Ou Pi x1 P2 Xo Ps X3 
[m, o()] 1.000 1.000 2.52 — — 
[7m, d(x), 5] 0.733 0.9012 2.25 0.988 20.00 — 


[m1, me], [m1, me, 0.676 0.444 0.00 0.556 7.20 
ma, and [m, mo, 


[m1, m2, (u)] and 0.651 0.6886 1.60 0.3114 9.32 — = 
mM, Me, o(p), 

[m1, M2, Ms, 5] 0.650 0.3571 0.00 0.6229 5.78 0.0199 20.00 

[d(u), d(u(1 — p)))} 0.650 0.8585 ~=—«-2.12 = = 

lotu), d(u(1 — p)), 0.648 0.8317 2.03 0.1683 20.00 _— —_ 


[7m, o(u(1 — p))] 0.645 0.3908 0.00 0.6092 6.57 — — 
a m, o(u(1 — 
p)); 


Lower Bounds of Pi al Pe Xo Ps Xs 
(b(n), d(u(1 — p))] 0.640 0.2869 0.00 0.7131 5.21 as = 
and [$(u), d(u(1 


— p)), 5] 
[m, d(u(1 — p)),b] 0.638 0.9329 285 0.0671 2000 — = 
[™m1, m2, m3] and 0.637 0.7811 2.11 0:2189 10.75 — — 
[m, Me, M3, b] 


[™m1, M2, (un), 5] 0.631 0.2783 0.00 0.6913 4.91 0.0304 20.00 
[mi, O(u)], [rm, 0.602 0.3094 0.00 0.6906 5.79 _— _— 
(u), O}, an 
[mi, ma, 6(n)] 
[™1, Me, b} 0.571 0.9524 3.20 0.0476 20.00 _ — 
[my, ma| 0417 1000 400 — = = = 


measured by the distance from the actual value of the prototype 
distribution). Similarly, for the pair [7m, ¢()], the upper bound is soft 
and the lower bound is hard. Of course, all these bounds are tight: 
they can either be attained for a given distribution or, for any « > 0, 
the bound can be attained within « by a given distribution. This notion 
of limiting tightness is needed, for example, for the lower bound when 
specifying [m, me]. 


V. OTHER MODELS 


We have used the GI/M/1 model to study extremal distributions 
because the model is analytically tractable and because we believe that 
similar results will hold for more complicated systems. For example, 
Bergmann et al.? have shown that the variance and higher cumulants 
of the equilibrium delay in a GI/G/1 system, given the first two 
moments of the interarrival time and service time, are maximized and 
minimized using the extremal distributions in Section II for the 
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Table IX—Extremal interarrival-time distributions for the GI/M/1 
queue with various parameter specifications: Case of Prototype II 
(m, = 4, c? = 0.8) with p = 0.9 











Extremal 
Given Parameter Character- 
Values istics Extremal Probability Mass Function, Mass p, and x, 
Upper Bounds Ou. Pi xy Peo Xe Ds x3 
[m, 6(u)] 1.000 1.000 2.78 — — — — 
[mi, o(), 5] 0.920 0.9120 2.45 0.0880 20.00 — — 
[d(u), d(u(1 — p))] 0.898 0.9683 2.67 — — — — 
lotu), o(u(1 — p)), 0.893 0.8999 2.41 0.1001 20.00 — — 
[7m, me], [m, me, 0.893 0.4440 0.00 0.5560 7.20 — — 
mrs), an [m, Mae, 
[7m, Me, (u)] and = 0.891 0.6886 1.60 0.3114 9.32 _— — 
[m, Ma, (u), b] 
(7m, d(u(1 — p)), 5] 0.890 0.4319 0.00 0.5681 7.04 —_— — 
[m, Me, M3, 6] 0.890 0.3571 0.00 0.6229 5.78 0.0199 20.00 
Lower Bounds ed Pi XM P2 Xe Ds x3 


[m, m2, m3] and 0.890 0.7810 2.11 0.2190 10.75 — 
[mi, Me, M3, ] 
(m1, d(u(1 — p)), 6] =: 0.889 0.9480 3.12 0.0520 20.00 — — 
[771, M2, o(u), 5] 0.888 0.2978 0.00 0.6740 5.09 0.0282 20.00 
[6(u), d(u(1 — p))] 0.888 0.3307 0.00 0.6693 5.88 ae ia 
and [otn), o(u(1 


— p)), 
[m, ma, b] 0.885 0.9524 3.20 0.0476 2000 — _ 
[ms, b(u)], [ma, 0.868 0.3163 0.00 0.6837 5.85 os _ 
o(u), b] and [m, 

Ma, (u)] 

[m1, me] 0.807 1.000 4.00 — = — — 


interarrival times and service times. Using F, for the interarrival time 
and Fy (actually the limit as b — ©) for the service time yields the 
maximum, while the reverse yields the minimum. As a consequence, 
Daley conjectured that related extremal properties held for the mean 
delay (or, equivalently, the mean queue length); see Bergmann et al.,° 
Open Problem 5.2.4 at the end of Section V in Stoyan,” and Daley 
and Trengove.”* In particular, Daley conjectured that for GI/G/1 
queues with the first two moments of the interarrival and service 
times given, the steady-state mean queue length L would be maximized 
and minimized using the extremal distributions in Section II for the 
interarrival-time and service-time distributions. Moveover, these ex- 
tremal properties should still hold if only one of the distributions is 
allowed to vary, and the other is fixed arbitrarily. 

Unfortunately, we now know that neither part of this conjecture is 
correct in general, but the principle does apply for some systems. Of 
course, the GI/M/1 results in Section II are consistent with the 
conjecture. Daley and Trengove™ showed that the limiting extremal 
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distribution Fy for the interarrival time yields the minimum mean 
queue length for all service-time distributions. Another system con- 
sistent with the conjecture is the K2/G/1 queue, which has an inter- 
arrival-time distribution with a rational Laplace-Stieltjes transform 
with a denominator of degree 2; see p. 329 of Cohen.!® As with the 
GI/M/1 queue, L depends on a single root of an equation involving 
the transform in addition to the specified parameters; see (5.205) on 
p. 330 of Ref. 19. Paralleling the GI/M/1 case, we have 


Theorem 11: For any K2/G/1 queue with fixed interarrival-time distri- 
bution and service-time distribution partially specified by the first two 
moments, L is maximized and minimized by using the extremal distri- 
butions in Section II for the service-time distribution. 

We do not give the proof of Theorem 11; related results for K2/G/1 
queues are obtained in Whitt’* and discussed in Part III.?" However, 
the analysis there also disproves the part of Daley’s conjecture claiming 
that the same extremal service-time distributions should yield the 
maximum (minimum) mean queue length for all fixed interarrival- 
time distributions. The analysis in Whitt'* shows that the extremal 
distribution maximizing L depends on the interarrival-time distribu- 
tion. For example, if the interarrival-time distribution is the convolu- 
tion of two exponential distributions, then L is minimized by letting 
the service-time distribution be the upper-bound two-point distribu- 
tion with mass c?/(1 + c*) on 0. On the other hand, if the interarrival- 
time distribution is the mixture of two exponential distributions, then 
L is maximized by letting service-time distribution be this upper- 
bound two-point distribution. (See Section VII of Part III”! for further 
discussion.) 

We also succeeded in disproving the first part of the conjecture by 
identifying a service-time distribution that produces a smaller mean 
queue length than either extremal distribution in Section II for the 
D/G/1 queue. Since the D arrival process obtained via the limiting 
extremal distribution F/ was shown by Daley and Trengove™ to yield 
the minimum given any service-time distribution, this counterexample 
applies to the global minimum as well as the minimum given a fixed 
interarrival distribution. The particular service-time distribution we 
used for our numerical example had all mass on multiples of the 
constant interarrival time. Daley (private communication) subse- 
quently observed that recent results of Ott?’ for the D/G/1 queue 
imply that these special service-time distributions are in fact extremal 
for the D/G/1 queue. 

The extremal distributions for the different parameter specifications 
in this paper should also be useful to give an indication of the range 
of possibilities in more complicated models. Even if the extremal 
distributions here are not actually extreme for the descriptive char- 
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Table X—The extreme values for the blocking probability in a 
GI/M/1 loss system, which is the transform value ¢(u), given the 
service rate, u, and the moments of the interarrival time 





Given Parameter Prototype Distribution Prototype Distribution 
Values I, c? = 2.0 II, c? = 0.8 
Upper Bounds p = 2/3 p = 9/10 p = 2/3 p =9/10 
[m1, M2], [m1, me, b] and [m,, m2, _—-0.670 0.678 0.481 0.519 
m3 
[771, M2, Ms, b] 0.592 0.613 0.428 0.482 
The actual blocking probability 0.510 0.562 0.388 0.461 
Lower Bounds p= 2/3 p = 9/10 p= 2/8 p = 9/10 
[1m1, Me, M3] and [771, me, Ms, 6] 0.400 0.495 0.358 0.446 
[m1, Me, 5] 0.304 0.411 0.287 0.392 
[77, me] 0.223 0.329 0.223 0.329 


acteristics of the more complicated model, these distributions should 
give a good idea of the range for the given parameters. 

It should be remembered, however, that the model affects which 
parameters are most useful. For a central-server closed network of 
queues, Lazowska” found percentiles much better than moments. Our 
GI/M/1 delay system results are also very different from Eckberg’s” 
GI/M/k loss system results. The GI/M/k blocking probability depends 
on the k parameters ¢(ju), J = 1, 2, ---, k. Hence, all the extremal 
distributions are extreme for this descriptive characteristic given the 
various parameter sets. However, ¢(u) strongly dominated m, as a 
second parameter in addition to the mean. 

To make a specific comparison, we consider the GI/M/1 loss system 
(no waiting room). For this system the blocking probability is just the 
transform value ¢(u). By Proposition 1, the extremal distributions in 
Sections II through IV are extreme for ¢(u). In Table X we display 
the extreme values of the blocking probability given the first two and 
first three moments, with and without the upper bound on the distri- 
bution. It is evident that the absolute and relative errors for ¢(u) are 
much greater than for o and L. 
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This paper continues the investigation begun in Part I of approximations 
for queues that are based on a few parameters partially characterizing the 
arrival process and the service-time distribution. Part I provides insight into 
approximations for intractable systems by considering the set of all possible 
values of the mean queue length in the GI/M/1 queue given the service rate 
and the first two moments of the interarrival-time distribution. The distribu- 
tions yielding the maximum and minimum values of the mean queue length 
turn out to be quite unusual, e.g., two-point distributions. This paper shows 
that the range of possible values can be reduced dramatically by imposing 
realistic shape constraints on the interarrival-time distribution with given 
first two moments. We found extremal distributions in the presence of shape 
constraints by restricting our attention to discrete distributions with all mass 
on a fixed finite set of points and solving nonlinear programs. The results 
strongly support the use of two-moment approximations in general queueing 
systems when the interarrival-time and service-time distributions are not too 
irregular. 


I. INTRODUCTION AND SUMMARY 


This paper continues the investigation begun in Part I’ of the set of 
possible values of the mean queue length L (number in system) in a 
GI/M/1 queue given the service rate, u, and various parameters 
partially characterizing the interarrival time cdf F (e.g., the first two 
moments m, and mz). As explained in Part I, we are not primarily 
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interested in the GI/M/1 model itself; we wish to provide a basis for 
evaluating approximations for more complex queueing models such as 
the nodes in a non-Markov network of queues.” For such complex 
models, the arrival process may be approximated by a renewal process, 
partially characterized by the first two moments of the renewal inter- 
val. Then the GI/M/1 model arises as an approximation and there is 
no complete interarrival-time distribution for an exact solution. We 
examine the GI/M/1 queue because it is tractable and because we 
believe it is indicative of what happens more generally. 
For the GI/M/1 queue,” 


L= p/(1 — o), (1) 


where p is the traffic intensity (p = 1/um,) and o is the unique root in 
the open interval (0, 1) of the equation 


o[u(1 — o)] = 4, (2) 


with ¢(s) the Laplace-Stieltjes transform of the interarrival-time cdf 
F: 


o(s) = i e“dF(t). (3) 


Unfortunately, given m,, m2, and p, the range of possible values of 
L can be very wide. (See the example in Section I of Part I.) This wide 
range naturally raises doubts about the value of two-moment approx- 
imations, but the particular distributions yielding the extreme values 
of L suggest that the approximations may still be useful. As we 
indicated in Part I, these extremal distributions are discrete probabil- 
ity distributions with positive probability on just two points. These 
two-point distributions are obviously very unusual. We would hope 
that for typical (nice) distributions L would not vary much among 
interarrival-time distributions with the same moments. In this paper, 
we investigate how much the range is reduced by imposing regularity 
conditions on the interarrival-time distribution. The regularity con- 
ditions we consider are shape constraints such as unimodality and log- 
convexity (a natural smoothness condition; see Chapter 5 of Keilson* 
and Section II). 

A major contribution here, we believe, is the method. To study the 
effect of the shape constraints, we restrict attention to discrete distri- 
butions with all mass on a fixed finite set of points. We then find the 
range of the mean queue length L by means of nonlinear programming. 

Since typical interarrival-time distributions are smooth (have den- 
sities), some may distrust results based on discrete distributions. 
However, continuity theorems show that there is no loss of generality, 
at least in principle, in considering distributions concentrating on a 
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fixed finite set of points; see Section 11 of Borovkov.° With enough 
points, such discrete distributions can be used to approximate an 
arbitrary interarrival-time distribution arbitrarily well (in the usual 
sense of convergence in distribution and convergence of moments). In 
turn, the queue-length distribution and the mean queue length L 
associated with finite-valued probability mass functions can be used 
to approximate the queue-length distribution and the mean queue 
length L associated with the arbitrary interarrival-time distribution. 

The point is that we need not worry about the local behavior of the 
interarrival-time distribution. For sufficiently small positive e, if we 
change an interarrival-time density, say f(t), only on the interval [to, 
to + e], for example, by making 


f(t) t E [to, to + «] 
f,(t) _ nf [to + nt = to)], to <ts to + e/n 
0 tp te/n<tstte, 


then the new density f,(t) will be very different from the density f(t) 
on [to, to + e] for large n, but the associated cdf’s will be close and the 
behavior of the associated queueing systems will be virtually indistin- 
guishable. 

While there is no loss in generality in restricting attention to discrete 
distributions, it is not clear how many points are enough and where 
they should be located. We have not made a systematic investigation 
of this question, but we believe that we have used enough points in 
our study. It is important to recognize that extra points are not free 
because the nonlinear programs typically become harder to solve. 

Throughout this paper, we use 21 points on the integers {0, 1, 
2, ---, 20}. By comparing the programming results without shape 
constraints here with the theoretical results based on the complete 
Tchebycheff systems in Part I, we can see the effect of the discreteness. 
This effect can be seen in Tables II and V. The upper bound 20 on 
the support of the distribution (which might not be regarded as an 
essential aspect of the discreteness) can have a significant impact, but 
otherwise the discreteness matters little. 

Do the shape constraints help? For the GI/M/1 example, with p = 
2/3 and c* = 2, assuming a log-convex probability mass function 
reduces the maximal possible error in L from 200 percent to 8 percent. 
If the third moment is fixed as well, the maximal possible error is less 
than 1 percent. 

These results indicate that two-moment approximations can be very 
useful, provided the interarrival-time distribution is actually not un- 
usually irregular. In this paper we only study the GI/M/1 queue, but 
we believe the results are indicative of what happens in GI/G/1 queues 
and more general systems. 
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On the other hand, even for the GI/M/1 queue, the results do not 
imply that the two-moment approximations will work well in all 
circumstances or that they should be used blindly. If it is known that 
the interarrival-time distribution has an unusual shape, then the 
approximation should probably be modified. If additional information 
is known that would permit working with a third parameter such as 
the third moment or the peakedness, then better results can be 
expected. As noted by Kuczura® in a related context, a third parameter 
seems to offer the possibility of significant improvement, but addi- 
tional parameters are rarely worth the effort. 

This paper is organized as follows. In Section II, we define prototype 
distributions, introduce the shape constraints, and formulate the 
mathematical programs. The prototype distributions are intended to 
be typical interarrival-time distributions, which we use to generate 
parameter values and the “exact” queue characteristics o and L. In 
Section III, we discuss the computational results for shape constraints 
with the first two moments fixed. In Section IV, we discuss the results 
for shape constraints with other parameters fixed. In Section V, we 
compare our results to other bounds and approximations. Finally, in 
Section VI, we discuss mathematical programming issues. It turns out 
that solving the nonlinear programs was not routine. These queueing 
problems may be interesting test problems for nonlinear programming 
codes. 

We conclude this introduction by mentioning an interesting out- 
come of our experiments. Unlike Part I, the approach here is primarily 
numerical, being based on nonlinear programs, but the extremal 
distributions yielding the minimum and maximum values of L obtained 
from the nonlinear programs exhibit regularity that suggests the 
possibility of an analytic treatment similar to Part I. The extremal 
distributions we obtain on the set {0, 1, --- , 20} have special structure 
and evidently do not depend on the traffic intensity. Hence, it may be 
possible to obtain analytic characterizations; this is a promising direc- 
tion of research. (In fact, an analytic approach to shape constraints is 
carried out in Part III,’ but not for discrete distributions and not for 
the shape constraints considered here.) Also, the robustness of the 
extremal distributions suggests that, just as with the extremal distri- 
butions in Part I, they should be useful in other contexts, e.g., to study 
the quality of two-moment approximations for inventory and reliabil- 
ity models as well as other queues. 


Il. PROTOTYPE DISTRIBUTIONS, SHAPE CONSTRAINTS, AND 
NONLINEAR PROGRAMS 


2.1 Prototype distributions 


To compare alternate parameter specifications and shape con- 
straints in a consistent and meaningful way, we introduce two “pro- 
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totype” distributions. The specified parameter values, e.g., the mo- 
ments, will be the parameter values of one of the prototype distribu- 
tions. The specified shape constraints also will be satisfied by one of 
the prototype distributions. In this way, we guarantee that there is at 
least one reasonable probability mass function satisfying all the con- 
ditions. 

Since mixtures and convolutions of two exponential distributions 
are frequently used in queueing, we use the discrete analogues: mix- 
tures and convolutions of two geometric distributions. The mixture of 
two geometric distributions has probability mass function 


Pr=y(1—a)o*+ (1-71-68, R20, (4) 


for probabilities a, 6, and y. As is often done with mixtures of 
exponential distributions,® we assume balanced means; i.e., we assume 
that ya/(1 — a) = (1 — y)6/(1 — 8). The convolution of two geometric 
distributions, on the other hand, has probability mass function 


k 
Pr= X (1 - a)ai(1 — 8)8o (5) 


j=0 


for probabilities a and 8. 

To have finite support, we truncate the distributions, and work with 
the conditional distribution given that the upper bound is not ex- 
ceeded. We truncate at 20, so that the support is the set of 21 integers 
{O, 1,2, --- , 20}. In each case the upper bound 20 is at least 5 standard 
deviations above the mean. 

Mixtures of exponential and geometric distributions are relatively 
more variable with squared coefficient of variation c? > 1, while 
convolutions are relatively less variable with c” < 1. Hence, we consider 
one prototype distribution of each type. Prototype I is a truncated 
mixture of two geometric distributions, having c? = 2.0; Prototype II 
is a truncated convolution of two geometric distributions, having c? = 
0.8. 

To obtain the specific prototype distributions, we start with the first 
two moments. For Prototype I, m; = 2.0 and m2 = 12.0 (c? = 2.0) and, 
for Prototype II, m; = 4.0 and mz = 28.8 (c? = 0.8). To obtain a 
Prototype I distribution with the chosen values of m, and me, we 
numerically solve a system of three nonlinear equations in the three 
unknowns a, 8, and y. Two of these equations are the formulas for 
the moments m, and m, of the truncated distribution; the third is the 
“balanced means” equation. To obtain a Prototype II distribution with 
the chosen values of m,; and mz, we solve the system of two nonlinear 
equations in a and 8 corresponding to the moments m,; and mz, of the 
truncated distribution. The two prototype distributions are displayed 
in Table I. Additional parameters (the third moment, m3, and trans- 
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Table |—The two prototype distributions: probability 
mass functions with p, on k 





Prototype I Prototype II 

k Pk Pr/Prv1 Pr Pr/Prvi 
0 0.3572 1.58 0.1215 0.79 
1 0.2262 1.58 0.1536 1.04 
2 0.1435 1.57 0.1475 1.16 
3 0.0912 1.57 0.1272 1.22 
4 0.0583 1.56 0.1040 1.26 
5 0.0374 1.54 0.0825 1.28 
6 0.0243 1.52 0.0642 1.30 
7 0.0160 1.49 0.0494 1.31 
8 0.0107 1.45 0.0377 1.32 
9 0.0074 1.40 0.0286 1.32 
10 0.0053 1.34 0.0217 1.32 
11 0.0040 1.27 0.0163 1.33 
12 0.0031 1.21 0.0123 1.33 
13 0.0026 1.15 0.0093 1.33 
14 0.0022 1.11 0.0070 1.33 
15 0.0020 1.08 0.0053 1.33 
16 0.0019 1.05 0.0040 1.33 
17 0.0018 1.04 0.0030 1.33 
18 0.0017 1.03 0.0022 1.33 
19 0.0017 1.02 0.0017 1.33 

20 0.0016 —_ 0.0013 — 

mean m, 2.00 mean m, 4.00 
c? 2.00 c2 0.80 


form values, e.g., evaluated at the service rate «) are given in Tables 
IV and V of Part I. 


2.2 Shape constraints 


Mixtures and convolutions of exponential and geometric distribu- 
tions have many nice properties; see Chapter 5 of Keilson.* Mixtures 
of exponential and geometric distributions are log-convex and thus 
are DFR, i.e., have decreasing failure rate. For discrete distributions 
with probability mass functions p; on the nonnegative integers, log- 
convexity means 


Pi <= PreiPen, kR2=1. (6) 


Since the ratios p;/p,-1 are nondecreasing with log-convexity, the 
distribution changes smoothly. The failure rate is 


Tk = Dr » Pj, Red. (7) 

Vs 
Decreasing failure rate of course implies that the probability mass 
function is decreasing. For log-convex distributions, c? = 1 and m3 = 


(3/V2)m3/ (see p. 69 of Keilson’). 
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Convolutions of exponential and geometric distributions are log- 
concave, i.e., the inequality (6) is reversed. Log-concavity is equivalent 
to strong unimodality. A probability mass function p, on the non- 
negative integers is unimodal if there is an integer ky such that 


Pe= Pri for ks ko 
and 
Pr= Pri for k= ko. (8) 


A probability mass function p; is strongly unimodal if the convolution 
with any unimodal probability mass function remains unimodal. In 
addition to being strongly unimodal, log-concave distributions are 
IFR, i.e., have increasing failure rate. For log-concave distributions, 
c? < 1 and m; < (3/V2)m3/ (see p. 69 of Keilson‘). 

Of course, truncation and conditioning alter some of these proper- 
ties. For example, the failure rates are changed significantly. For 
Prototype I, the failure rate is decreasing for the first thirteen values 
but is increasing after that. The failure rate remains increasing for 
Prototype II. The mass function ratios p;/pz+1 are unchanged by the 
truncation, however. Also the unimodality properties are unchanged: 
Prototype I is decreasing and Prototype II is unimodal with a mode 
at 1. 

In our study, we focus on the shape constraints unaffected by the 
truncation, namely, log-convexity or log-concavity and unimodality. 
We also consider additional parameters such as the third moment, 
transform values, and constraints on the cdf F. 


2.3 The nonlinear programs 


From (1), we see that the mean queue length L depends only on the 
traffic intensity p and the root o of (2). Since L is an increasing 
function of o, the maximum and minimum values of L are attained by 
the maximum and minimum values of oc. For interarrival-time distri- 
butions with probability mass functions {p,;} on the set {0, 1, 2, ---, 
20}, (2) becomes 


20 
y ee = a. (9) 
k=0 
To find the maximum and minimum values of o, we solve nonlinear 
programs. The variables are o and the probability masses p,;, k = 
0,1, --- , 20. The constraints specify that { p,} is a proper probability 
distribution with the specified properties and that (9) holds. 
Given the two interarrival-time moments m; and me, the upper 
bound b = 20 on the support of the interarrival-time distribution, the 
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service rate u, and no shape constraints, we have a nonlinear program 
(NLP) for the maximum of the form: 


(NLP) max a, (10a) 
subject to: 
20 
yep =6; (10b) 
k=0 
20 
y Pr= 1, (10c) 
k=0 
20 
> Rpp =m, (10d) 
k=0 
20 
>» k?p, = me, (10e) 
k=0 
Pr=O forall k, (10f) 
Q0Ososl1l-«6 (10g) 


where 0 < « < 1. For any probability mass function {p,}, the queue is 
stable if and only if p = 1/um, < 1, in which case o is the unique 
solution to (10b) in the open interval (0, 1). Since o = 1 is also a 
solution to (10b), we rule it out by bounding o above in (10g). 

Of course, we obtain a corresponding NLP for the minimum value 
of o by changing (10a) from a maximum to a minimum. If there is a 
mode at ko, then we add the constraints (8) to (10). In the nonlinear 
program for c? = 2.0, we assumed that the mode is at 0; in the nonlinear 
program for c? = 0.8, we assumed that the mode is at 1. This is 
consistent with the location of the modes in the prototype distribu- 
tions. If we were to assume only unimodality without specifying where 
the mode is, then we would have to solve a program for each possible 
mode location, and then optimize over the solutions. 

When log-convexity is assumed, we add the constraints (6) for k = 
1, ---, 19, to (10). For log-concavity, we add the constraints (6) with 
the inequality reversed. 


Ill. SHAPE CONSTRAINTS WITH TWO MOMENTS FIXED 


In this section, we give the minimum and maximum values of the 
root o, denoted by av and a,, respectively, and the interarrival-time 
distributions yielding these extreme values of co. From (1), we obtain 
the extreme values of L, denoted by Ly and L,. We also give the 
maximum relative error (MRE) in L, which is computed as 
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L,—Lf_ 6, — 0¢ 
Lz l-«o, 


Table II gives the extremal characteristics and the MRE for the two 
prototype distributions (c? = 2.0 and 0.8), two values of the traffic 
intensity (9 = 2/3 and 9/10), and five constraint cases: 

1. Two moments fixed only 

2. Plus an upper bound b = 20 on the support of the distribution 

3. Plus discrete, all mass on {0, 1, --- , 20} 

4, Plus unimodal 

5. Plus log-convex (c? = 2.0) or log-concave (c? = 0.8). 

The results in the first two cases, before discreteness is imposed, come 
from Tables IV and V of Part I. The last three cases are the solutions 
to the nonlinear programs described in Section 2.3. Tables III through 
V give the associated extremal probability mass functions. Notice that 
these extremal distributions are the same for both values of p. 

Each successive case adds an additional constraint to the one before, 
so the subsets of feasible interarrival-time distributions are nested, 
and the extremal characteristics get closer to the values for the 
prototype distributions. 

The main conclusion is that with fairly strong but reasonable shape 
constraints the maximum relative error given two moments is dra- 
matically reduced, becoming small enough to justify two-moment 
approximations. In particular, with log-convexity or log-concavity the 
MRE is always less than 8 percent, with the average MRE over the 
four cases being 3.8 percent. Unimodality helps, but is not good in the 
case c? = 2.0 and p = 2/3, yielding a 33.7-percent MRE. However, 
from Tables III and IV it is apparent that the unimodal extremal 
distributions are still quite irregular. 

As in Part I, we see that the MRE gets smaller as p increases and 
c? decreases. From Table II, it is evident that this property holds with 
shape constraints as well as without. We also see that the upper bound 
of 20 on the support of the interarrival-time distribution strongly 
affects the minimum characteristic a7 but does not change the maxi- 
mum characteristic a, at all. The discreteness either has no effect (for 
o,, when c? = 2.0) or only a very small effect. 

As we indicated above, there is another significant conclusion. The 
extremal probability distributions on the set of integers {0, 1, 2, ---, 
20} obtained from the nonlinear programs evidently share an impor- 
tant property with the extremal distributions on [0, 20] given fixed 
parameters, treated in Part I: The extremal distributions computed by 
the nonlinear programs are evidently independent of the traffic inten- 
sity p. 

Consider the case of no shape constraints (Table V). The extremal 
distributions on the set {0, 1, --- , 20} computed by the nonlinear 
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Table 1I—The extremal characteristics, ¢7 and o.,, and maximum relative errors (MRE) for the GI/M/1 queue: the 


cases of Prototype Distributions | and II, traffic intensities 2/3 and 9/10, and different shape constraints 


Prototype Distribution I, c? = 2.0 








; p = 2/3 
Constraints on the Inter- 
arrival-Time Distribution of Cn MRE 
Two moments mj andm, 0.417 0.806 2.00 
Plus upper bound at 20 0.645 0.806 0.83 
Plus discrete, all mass on 0.660 0.806 0.75 
(0, 1, 2,..., 20) 
Plus unimodal 0.730 0.798 0.34 
Plus log-convex (I) or log- 0.762 0.779 0.08 
concave (II) 
The prototype distribution 0.7676 0.7676 0.00 


Of 


0.807 
0.926 
0.9268 


0.9306 
0.9320 


0.9324 





p= 9/10 
On 
0.936 


0.936 
0.9356 


0.9350 
0.9336 


0.9324 


MRE 


2.00 
0.16 
0.14 


0.07 
0.02 


0.00 


of 
0.417 


0.571 
0.5732 


0.6145 
0.6387 


0.6429 


Prototype Distribution II, c? = 0.8 


p = 2/3 
Oy 


0.676 
0.676 
0.6760 


0.6608 
0.6533 


0.6429 


MRE 


0.80 
0.32 
0.32 


0.14 
0.04 


0.00 


of 
0.807 
0.885 
0.8848 


0.8878 
0.8897 


0.8901 


p = 9/10 


Ou 


0.893 
0.893 
0.8926 


0.8915 
0.8909 


0.8901 


MRE 


0.808 
0.075 
0.073 


0.034 
0.001 


0.000 


Table !II—The distributions minimizing the GI/M/1 mean 
queue length L given two moments and the shape 
constraints 











Prototype Distribution I, Prototype Distribution II, 
c? = 2.0 c? = 0.8 
Unimodal Log-Convex Unimodal Log-Concave 
p=*and%o p=%and %o p=*sand%o p=%and%o 
0 0.2460 0.3486 0.0000 0.0619 
1 0.2460 0.2260 0.1810 0.2155 
2 0.2460 0.1465 0.1810 0.1663 
3 0.2090 0.0950 0.1810 0.1283 
4 0.0031 0.0616 0.1810 0.0990 
5 0.0031 0.0399 0.1748 0.0764 
6 0.0031 0.0259 0.0068 0.0589 
7 0.0031 0.0168 0.0068 0.0455 
8 0.0031 0.0109 0.0068 0.0351 
9 0.0031 0.0071 0.0068 0.0271 
10 0.0031 0.0046 0.0068 0.0209 
11 0.0031 0.0030 0.0068 0.0161 
12 0.0031 0.0019 0.0068 0.0124 
13 0.0031 0.0012 0.0068 0.0096 
14 0.0031 0.0008 0.0068 0.0074 
15 0.0031 0.0005 0.0068 0.0057 
16 0.0031 0.0003 0.0068 0.0044 
17 0.0031 0.0002 0.0068 0.0034 
18 0.0031 0.0001 0.0068 0.0026 
19 0.0031 0.0001 0.0068 0.0020 
20 0.0031 0.0088 0.0068 0.0016 


programs are related to the analytic extremal distributions on the 
interval [0, b], derived in Section I of Part I. In Part I, the distribution 
yielding the upper limit of L is a two-point distribution with positive 
probability mass on 0 and another point x,. The extremal distribution 
yielding the lower bound is also a two-point distribution with mass on 
a point xv and on b. (The points x, and xz are determined by the 
requirement that the distributions have moments m, and m,.) Our 
results support the following conjecture: 


Conjecture 1: The extremal distributions on the set of integers {0, 1, 2, 
--+ b} given the same two moments m, and mz have as mass points the 
triples (0, Xegs %.) and (x4, xZ, b), respectively, where .x,is the greatest 
integer less than x and * is the least integer greater than x. If x, or x7 
is an integer, then the three-point extremal distribution reduces to a 
two-point distribution. 

With no additional shape constraints, we can show that the solution 
to the NLP (10) has at most three nonzero values of p,. To see this, 
consider the situation where an extreme value of o in (10) is known 
for particular values of m,; and m2. Then, we can combine (10a) and 
(10b) to form a linear objective function in the remaining variables p,. 
With this linear objective, the three linear constraints (10c), (10d), 
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Table [V—The distributions maximizing the GI/M/1 mean 
queue length L given two moments and the shape 
constraints 





Prototype Distribution I, Prototype Distribution II, 
c? = 2.0 7 =0.8 
Unimodal Log-Convex Unimodal Log-Concave 





p=*%and%o0 p=%and%o p=*%and%o p=%and %o 


0 0.5778 0.4377 0.1985 0.1719 
1 0.0500 0.1571 0.1985 0.1450 
2 0.0500 0.1133 0.0629 0.1223 
3 0.0500 0.0817 0.0629 0.1031 
4 0.0500 0.0589 0.0629 0.0870 
5 0.0500 0.0425 0.0629 0.0734 
6 0.0500 0.0306 0.0629 0.0619 
7 0.0500 0.0221 0.0629 0.0522 
8 0.0500 0.0159 0.0629 0.0440 
9 0.0222 0.0115 0.0629 0.0371 
10 0.0000 0.0083 0.0629 0.0312 
11 0.0000 0.0060 0.0367 0.0262 
12 0.0000 0.0043 0.0000 0.0221 
13 0.0000 0.0031 0.0000 0.0181 
14 0.0000 0.0022 0.0000 0.0043 
15 0.0000 0.0016 0.0000 0.0001 
16 0.0000 0.0012 0.0000 0.0000 
17 0.0000 0.0008 0.0000 0.0000 
18 0.0000 0.0006 0.0000 0.0000 
19 0.0000 0.0004 0.0000 0.0000 
20 0.0000 0.0003 0.0000 0.0000 


and (10e), and the bounding constraints (10f), we can determine the 
values for the p;, by solving a linear program for which only three 
variables will be in the basis. Hence, to establish Conjecture 1, it 
suffices to verify that the special three-point distributions are optimal 
among all feasible three-point distributions for all these objective 
functions, i.e., for all arguments of the transform. Of course, if the 
extremal mass points xz and x, for the distributions on [0, 6] are 
integer, then these extremal distributions on [0, b] are feasible for the 
smaller set {0, 1, --- , b} and are thus still optimal. This happens here 
for the upper bound with c? = 2.0. 

The following conjecture for the cases with shape constraints is also 
supported by our experiments (we solved the programs for traffic 
intensities ranging from 0.01 to 0.9): 
Conjecture 2: For each kind of shape constraint considered, the ex- 
tremal interarrival-time distributions on {0, 1, 2, ---, b} for the 
GI/M/1 queue, given the first two moments of the interarrival-time 
distribution, are independent of the traffic intensity, p. 

Moreover, there is an obvious regularity in the extremal unimodal 
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Table V—The extremal GI/M/1 interarrival-time distributions without 
shape constraints: the effect of discreteness and an upper bound on 
the support of the distribution 


Prototype Distribution I, c* = 2.0 











Upper Bounds, a,, Pi Xy Pe X2 Ps x3 

Cases 1, 2, and 3 0.6667 0.000 0.3333 6.00 — — 

Lower Bounds, o7 Pi xy P2 Xe Ds X3 
Case 3 0.4211 1.000 0.5555 2.00 0.0234 20.00 

Case 2 0.9759 1.556 0.0241 20.00 — — 

Case 1 1.0000 2.000 —_ _— — —_ 

Prototype Distribution II, c? = 0.8 

Upper Bounds, a,, Pi x1 Pe Xe Ps x3 

Cases 1 and 2 0.444 0.000 0.556 7.20 — — 
Case 3 0.4429 0.000 0.4571 7.00 0.1000 8.00 

Lower Bounds, o7 Pi xy D2 Xe Ps X3 
Case 3 0.7529 3.000 0.2000 4.00 0.0471 20.00 

Case 2 0.9524 3.200 0.0476 20.00 — — 

Case 1 1.000 4.000 — — — — 


Note: The cases are described at the beginning of Section II of this paper. x; is the 
ith point with positive probability mass; p; is the probability mass at point <;. 


distributions: they have only a few points of mass change. This can be 
explained by making a change of variables. For unimodal distributions 
on {0, 1, ---, 6} with a mode at 0, we can make the change of variables 


Gz = (R + 1)( De — Pes), k= 0, (12) 


with py+1 = 0. Then q; = 0 for all k, and the constraints for p, become 
the following constraints for q;: 


b b . 
YY ge=1, YY kay = 2m, (13) 
h=0 h=0 

and 

b 
> k@p = 3M2 = m,/2. (14) 
k=0 


Moreover, the linear objective function ¥?-» e~p; is transformed into 
the linear objective function 


b k 
Y ak + 1) Ye, 
k=0 j=0 


Solving the transformed linear program yields three-point solutions. 
Hence, the extreme points for the decreasing distributions with uni- 
modal constraints have at most three points of decrease after 0. For 
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decreasing probability mass functions, we thus make the following 
conjecture. 
Conjecture 3: Let (0, x,) and (xz, b) be the pairs of mass points for the 
extremal distributions on [0, b] given the first two moments 2m, and 
3M_, — m,/2, obtained from Section II of Part I. Then, for the 
GI/M/1 queue characteristics, the extremal decreasing probability mass 
functions on the set of integers {0, 1, 2, ---, b} given the first two 
moments m, and mz have as points of decrease the triples (0, x,,, %2) and 
(x4, x4, b), respectively. (This completely determines the extremal prob- 
ability distributions.) 
For other modes, say ko, we can do a similar change of variables, 
namely, 
as ,) (Dhoti — Photit1)> O<js b — ko, 
Qkytji = (15) 
UF 82 ( Drotj+1 — Dho+j)s —-kj -lsjs=-1, 





with p_; = Ps+1 = 0, so that q, is a probability mass function on {—1, 
0,1, --- , 20} for which 

i=j 

Dj = » aij Ji-1,; 0 =] = b, (16) 

i=0 
where aj; are appropriate constants determined by (15). As before, the 
two linear moment constraints for p, become linear moment con- 
straints for q;. In addition, there is an extra linear constraint on the 
gz when ky > 0 since the support of gq, has one more point, i.e., is {—1, 
0,1, --- , b} instead of {0, 1, --- , b}. In particular, from (15) it is easy 
to see that 





b—ky 9 Rot 9 
| Gat , ;=0. 1 
» (5 = ;) Tho » (5 = ) FVhg-i (17) 


Therefore, solving the transformed linear program would require the 
inclusion of (17) as a fourth linear constraint. This results in four 
positive values among the q,. Therefore, extremal unimodal distribu- 
tions with mode ky > 0 must have at most four points of mass change, 
including any positive mass at 0 and 20. The unimodal extremal 
distributions for Prototype II with kp = 1 obtained from the nonlinear 
programs have this property; see Tables III and IV. For Prototype II, 
we also found that the extremal characteristics o7 and a, both de- 
creased as the mode ky was changed from 0 to 1 to 2, e.g., a7 changed 
from 0.633 to 0.614 to 0.603. 

The numerical results also show that the extremal distributions for 
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the log-convex and log-concave constraints have special regularity 
that can be seen by looking at the successive ratios Dp/Dz+1- 
Conjecture 4: In the log-convex case, the upper-(lower-) bound distri- 
bution has constant ratios pz/pp+i for k = 1, 2, ---, b (kR = 0, 1, 
b — 1) with an extra mass at 0 (b). (This determines the interarrival- 
time distribution.) 

Conjecture 4 is supported by Tables II and IV. With log-con- 
vex constraints, the upper-(lower-) bound ratios are p;/p2 ~ 1.387 


(pi/p2 * 1.542). 


IV. SHAPE CONSTRAINTS WITH OTHER PARAMETER SPECIFICATIONS 


In this section, we investigate different parameter specifications, 
both with and without shape constraints. However, attention is fo- 
cused on the cases with shape constraints because alternate parameter 
specifications without shape constraints were considered in Sections 
III and IV of Part I. We consider only Prototype Distribution I with 
the traffic intensity p = 2/3. This is the difficult case in Section III, 
yielding the largest maximum relative errors. 

Table VI contains the major results. It gives the maximum relative 
errors in L for various combinations of two and three parameters with 
no shape constraints and with log-convex shape constraints. Of course, 
we still consider the first two moments m, and mp». The additional 
parameters that we consider are: the third moment, m3, the Laplace- 
Stieltjes transform evaluated at the service rate, ¢(u), and the inter- 
arrival time cdf evaluated at k, F(k), i.e., F(R) = po + Pi + +++ + Dp. 
These parameters are fixed at the values satisfied by Prototype I. In 
particular, we use m3 = 119.01, ¢(u) = 0.5098, F(0) = 0.35724, F(2) = 
0.72692, and F(7) = 0.95409. Combinations of two parameters are 


Table VI—A comparison of alternate second- and third-parameter 
specifications: the maximum relative error (MRE) in the mean queue 
length L in a GI/M/1 queue, based on Prototype | with p = 2/3 


The Second Parameter in Addition to m, 








Mg $(x) F(0) F(2) F(7) 
No shape constraint 0.75 1.53 3.10 5.09 3.81 
Plus log-convexity 0.077 0.083 0.096 0.370 0.555 
The Third Parameter in Addition to m; and mz 
M3 $(u) F(0) F(2) F(7) 
No shape constraint 0.069 0.175 0.331 0.604 0.609 
Plus log-convexity 0.009 0.012 0.019 0.049 0.020 


Note: m, is the kth moment, ¢(u) is the Laplace-Stieltjes transform evaluated at the 
service rate yp, and F(k) is the cdf evaluated at k, ie., F(R) = po + pr +... + Dp, of 
Prototype Distribution I. These values are m, = 2. 00, mz = 12.00, m3 = 119. 01, o(u) = 
0.5098, F(0) = 0.35724, F(2) = 0.72692, and F(7) = 0.95409. The distribution itself 
appears in Table I. All these results are obtained from the nonlinear programs. 
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formed by specifying each of the additional parameters together with 
the first moment m,. Combinations of three parameters are formed by 
specifying each of the additional parameters together with the first 
two moments m, and mp. 

The first conclusion is that, with log-convexity, the third moment 
or almost any other third parameter in addition to the first two 
moments makes the maximum relative error negligible. For the third 
moment, the maximum relative error is less than one percent and for 
all but one of the other third parameters it is less than two percent. 
This suggests that with nice distributions three-moment approxima- 
tions ought to work very well for more general models. 

The second conclusion is that the next higher moment is the best 
additional parameter in all cases. However, the advantage of the 
moment over the transform value decreases dramatically with log- 
convexity. Although the cdf constraints certainly reduce the MRE, 
the next higher moment and the transform value perform better as 
additional parameters. 

In order to have the maximum relative error small enough to justify 
approximations, say less than 10 percent, it appears that three con- 
straints are enough. It suffices to specify either three moments without 
shape constraints (6.9-percent MRE) or specify two moments with 
log-convex shape constraints (7.7-percent MRE). We can think of log- 
convexity as being roughly equivalent to another moment parameter. 

The values of o, the GI/M/1 probability of delay, for the various 
parameter specifications and shape constraints are given in Tables 
VII and VIII. From Table VII we see an interesting reversal in form 
with log-convexity. Without shape constraints, the next higher mo- 
ment is better than the transform value ¢(u) as an additional param- 
eter for the upper bound but not as the second parameter for the lower 
bound. With log-convexity, these orderings are reversed. From Table 
VIII, we see that F(0) is significantly better than the other two cdf 
values as an additional parameter for the upper bound with log- 
convexity and for the lower bound with no shape constraints, but not 
in the other cases. 

We also tabulated the extremal interarrival-time distributions for 
the different combinations of parameters and shape constraints, but 
they have been omitted to save space. As in Section III, these extremal 
distributions have important regularity properties. With k parameters 
and no shape constraints, the extremal distributions have at most k + 
2 positive mass points; with k parameters and a decreasing mass 
function, the extremal distribution have at most k + 2 points of mass 
change after 0. As in Section III, there is also regularity in the extremal 
distributions in the log-convex case, which can be seen by looking at 
the successive ratios p;/Dp+i. There appear to be only a few points 
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Table VilI—The GI/M/1 extremal characteristics o (the probability of 
delay) given c* = 2.0, p = 2/3, and the different shape constraints 


Parameters Specified in Addition to the Mean 


Me (n) o(u(1—p)) me, m3 Ma, o(u) 

No shape constraints 0.8057 0.8811 0.7994 0.7768 0.7844 

o,, Unimodal 0.7983 0.8203 0.7777 0.7708 0.7777 
Log-convex 0.7790 0.7736 0.7690 0.7694 0.7680 
Prototype Distribution 0.7675 0.7675 0.7675 0.7675 0.7675 
Log-convex 0.7621 0.7542 0.7638 0.7673 0.7652 

ov Unimodal 0.7374 0.7192 0.7574 0.7615 0.7606 
No shape constraints 0.6601 0.6986 0.7545 0.7548 0.7466 


Table ViIl—The extremal characteristics o (the probability of delay) 
for the GI/M/1 queue given values of the cumulative distribution 
function F in addition to the first moment, m,, or the first two 
moments, m, and m2: the case of Prototype Distribution | with traffic 
intensity p = 2/3 


Additional Parameter with Additional Parameter with 





Mm, m, and mz 
F(0) F(2) F(7) F(0) F(2) F(7) 

co, Noshape con- 0.9093 0.9135 0.9034 0.7927 0.8019 0.8042 
straint 

Unimodal 0.8523 0.8558 0.8408 0.7867 0.7940 0.7959 

Log-convex 0.7760 0.8278 0.8385 0.7684 0.7781 0.7706 

of lLog-convex 0.7546 0.7641 0.7489 0.7639 0.7670 0.7659 

Unimodal 0.6849 0.6683 0.6859 0.7544 0.73888 0.7427 

No shape con- 0.6280 0.47386 0.5351 0.7241 0.6823 0.6849 
straint 


where these ratios change. Including the final mass point, for k 
parameters there appear to be k points where the ratios change. Given 
Mz and ms, the ratios pz-1/p, change for the lower bound at k € {10, 
11} and for the upper bound at k € {2, 20}. Given only mg, the ratio 
changes for the lower bound at k = 20 and for the upper bound at k = 
2, 

Although we report results for only a single value of the traffic 
intensity p, it also appears that the extremal distributions do not 
change with p, i.e., the numerical solutions to the nonlinear programs 
were indistinguishable for a range of p values tested from 0.01 to 0.9. 
There are natural extensions for Conjectures 1 through 4 to other 
parameter specifications. 

We also found the extreme values of the transform values ¢(u), 
which are the blocking probabilities for the associated GI/M/1 loss 
system, for given moments and shape constraints. The numerical 
solutions for the extremal distributions appear to be the same as those 
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in which o is the objective. The extremal blocking probabilities are 
given in Table IX. As in Section V and Table X of Part I, the 
constraints pin down the delay probability o better than the associated 
blocking probability (yz). 


V. OTHER BOUNDS AND APPROXIMATIONS FOR GI/G/1 QUEUES 


Having obtained the extreme values of the GI/M/1 mean queue 
length L given two moments and various shape constraints, we note 
how these results compare with other bounds and approximations for 
the GI/G/1 queue that depend only on the first two moments of the 
interarrival times and service times. Several of these other bounds and 
approximations are defined and compared in Shanthikumar and Buz- 
acott.? These bounds are stated for the mean waiting time, but they 
are easily translated into the mean queue length by Little’s formula. 
Among the bounds and approximations treated there is the Kingman” 
upper bound and the Marchal" approximation based on it. Recently, 
Daley” obtained a better upper bound, (1.5) there, which can be used 
to produce an approximation by scaling to make the M/G/1 case exact, 
just as Marchal did for the Kingman bound. We call this new approx- 
imation Marchal (D) and the original Marchal approximation Marchal 
(K). Shanthikumar and Buzacott also discuss the Kraemer and Lan- 
genbach-Belz”® approximation and a modification of Page’s'* approx- 
imation based on it, formula (8) there, which we call Modified-Page. 
They also discuss an approximation by Sakasegawa’ and Yu,’ which 
coincides with the monotone-failure-rate approximation in Whitt.” 
Another natural two-moment approximation is to fit a hyperexponen- 
tial distribution with balanced means to the two moments, provided 
c? = 1 (see Section III of Whitt’) and solve the resulting H’/H3/1 
queue via a vector-state Markov process. When a distribution is 
exponential, H3 becomes M, so for the setting of the GI/M/1 queue 
based on Prototype I we obtain the H3/M/1 queue. Finally, the crudest 


Table IX—The extremal blocking probabilities for the 
associated G1/M/1 loss system (the transform values 
(u)) with given moments and shape constraints: case 
of Prototype | with m, = 2, mz = 12, m3 = 119, and p= 
4/3 (p = 2/3) 


The Moment Parameters 


The shape constraints m, M2 mM, Mg, M3 
Max ¢(), no shape constraints 0.6641 0.5902 
Max ¢(), unimodal 0.6225 0.5414 
Max ¢(u), log-convex 0.5499 0.5147 

Prototype I 0.5098 0.5098 
Min ¢(u), log-convex 0.5026 0.5087 
Min ¢(u), unimodal 0.4395 0.4713 
Min ¢(), no shape constraints 0.3279 0.4049 
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approximation is obtained by ignoring the second moments and using 
the M/M/1 formula L = p/(1 — p). There is also a related collection 
of approximations arising from diffusion approximations that we will 
not consider here; see Whitt’® and references there. 

We also include bounds for GI/G/1 queue in which the interarrival- 
time distribution is IFR or DFR.’ Marshall’® obtained a lower bound 
for IFR/G/1 queues and an upper bound for DFR/G/1 queues. Stoyan 
and Stoyan” also obtained an upper bound for IFR/G/1 queues and a 
lower bound for DFR/G/1 queues, which is just the M/G/1 queue with 
the given arrival rate. (In fact, the interarrival-time distribution is 
only required to be NBUE or NWUE, i.e., new better or worse than 
used in expectation.) The DFR bounds, but not the IFR bounds, are 
tight.2" 

In Table X these various bounds and approximations are compared 
with the extreme values of L for c* = 2.0 and 0.8 (the two prototype 
distributions), and p = 2/3 and 9/10. When interpreting these results, 
note that none of the other bounds and approximations use the fact 
that the service-time distribution is exponential. Also, the DFR and 
the IFR bounds are based on interarrival-time distributions having 
densities with support on the entire positive half line, whereas the 
bounds obtained here in Section II are based on interarrival-time 
distributions with support {0, 1, ---, 20}. The upper bound b = 20 on 
the support of the interarrival-time distribution has a significant 
impact on the lower bound mean queue length, L4 when the inter- 
arrival-time distribution in DFR (c? > 1) and on the upper bound 
mean queue length, L,, when the interarrival-time distribution is IFR 
(c? <1). 

The first conclusion is that all the approximations, with the excep- 
tion of the M/M/1 approximation when c” = 2, appear to be within 
the range of reasonable values for actual GI/M/1 systems. However, 
for c? = 2.0 and p = 2/3, the Modified-Page approximation seems a 
bit high. The Kraemer and Langenbach-Belz approximations for c? = 
2.0 seem low compared to the log-convex discrete lower bounds (Case 
5), but note that the Kraemer and Langenbach-Belz approximations 
are close to the H3/M/1 values. 

The second conclusion is that the D/M/1 lower bound and the 
Kingman and Daley upper bounds are not close enough to be good 
approximations. Of course, the upper bounds are asymptotically tight 
in heavy traffic, so they are not too bad when p = 0.9. 

We believe that the shape constraints play a very useful role. They 
narrow down the range of possible values for L, so it is reasonable to 
consider approximations based on two moments only. Instead of 
concluding that it is not possible to obtain a good approximation when 
c? > 1 (p. 765 of Shanthikumar and Buzacott’), we conclude that it is 
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Table X—A comparison of the GI/M/1 extreme values of the mean 
queue length, L, with other bounds and approximations for L in 
Gl/G/1 queues that depend on the first two moments of the 
interarrival times and service times 


Prototype Distribution I, Prototype Distribution II, 
c? = 2.0 c? = 0.8 
GI/G/1 upper bounds p= 2/3 p = 9/10 p = 2/3 p = 9/10 
Kingman 4.33 14.95 2.53 8.95 
Daley 4.00 14.85 2.40 8.91 
DFR or IFR 3.00 13.50 2.00 9.00 
GI/M/1 upper bounds 
Case 1, two moments 3.44 14.06 2.06 8.41 
only 
Case 2, bound on sup- 3.44 14.06 2.06 8.41 
port 
Case 3, discrete 3.44 13.98 2.06 8.38 
Case 4, unimodal 3.30 13.85 1.97 8.29 
Case 5, log-concave or 3.02 13.55 1.92 8.25 
log-convex 
Approximations 
Marchal (K) 2.92 13.48 1.82 8.11 
Marchal (D) 2.89 13.46 1.82 8.11 
Kraemer and L-B 2.56 12.85 1.86 8.18 
Modified-Page 3.28 13.34 1.83 8.13 
Sakasegawa and Yu 2.67 13.05 1.87 8.19 
H3/M/1 2.64 13.03 — — 
M/M/1 2.00 9.00 2.00 9.00 
GI/M/1 lower bounds 
Case 5, log-concave or 2.80 13.24 1.85 8.16 
log-convex 
Case 4, unimodal 2.47 12.97 1.73 8.02 
Case 3, discrete 1.96 12.30 1.56 7.83 
Case 2, bound on sup- 1.88 12.16 1.55 7.81 
port 
Case 1, two moments 1.14 4.66 1.14 4.66 
only 
GI/G/1 lower bound 
DFR or IFR 2.00 9.00 1.80 8.00 


Note: The actual values of L for the prototype distributions are 2.87, 13.31, 1.87, and 
8.19, respectively. 


possible to consider approximations based on two moments, with the 
caveat that the distributions should not be too irregular. 


VI. MATHEMATICAL PROGRAMMING ISSUES 


Solving the nonlinear programs turned out to be quite complicated, 
especially when the shape constraints were included. The programs 
involve 22 variables and up to 46 constraints. This is a reasonably 
large problem for most general-purpose nonlinear programming codes. 
In addition, when the nonlinear constraints (6) are present, the 
problems apparently become ill-conditioned and poorly scaled, causing 
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numerical difficulties and, often, nonconvergence of standard nonlin- 
ear programming algorithms. 

The numerical results reported in this paper were obtained using 
two nonlinear programming codes from the Harwell Subroutine Li- 
brary, compiled by the Numerical Analysis Group at the United 
Kingdom Atomic Energy Authority. These codes were VFO1AD, an 
augmented Lagrangian code described in Fletcher,” and VF02AD, a 
quadratic approximation code due to Powell.”” They were run in double 
precision on an Amdahl 470/V6 computer operating under multiple 
virtual storage. Both codes are included in a recent performance 
comparison of available state-of-the-art computer codes compiled by 
Schittkowski.”" 

Although numerical problems were experienced with both codes, 
they were far more prevalent with the augmented Lagrangian code 
VFO1AD. The augmented Lagrangian code solves a sequence of un- 
constrained optimization subproblems. Unfortunately, some of the 
problems (especially for p = 0.9 and Prototype II) resulted in ill- 
conditioned subproblems and, occasionally, subproblems with an un- 
bounded optimum, in which case, the augmented Lagrangian code did 
not converge. Our experience bears out the experience of Schittkowski, 
who reported that the performance of this code deteriorates drastically 
for ill-conditioned problems and is highly sensitive to slight variations 
of the problem. Certain standard measures, however, were able to 
overcome the numerical difficulties in most instances. For example, 
in some runs, the default settings for certain penalty parameters were 
overridden according to rules of thumb suggested by Gill, Murray, and 
Wright (see pp. 295-6 of Ref. 24). 

Fortunately, for those experiments for which code VF01AD did not 
obtain a solution, code VFO2AD did. This supports Schittkowski’s 
conclusion that code VF02AD is one of the most robust and reliable 
codes available. Even though several individual runs experienced nu- 
merical overflows and underflows and eventual nonconvergence, it 
was always possible eventually to obtain convergence with this quad- 
ratic approximation code using some starting point. In particular, 
problems with p = 0.9 and Prototype II were solved with less difficulty 
using VFO2AD. 

All runs were tried with a variety of starting points. These starting 
points included the prototype distributions, the uniform distribution, 
solutions obtained for other parameter settings, and an initial all-zero 
solution. 

In summary, then, although this nonlinear programming method 
for analyzing the quality of queueing approximations provides consid- 
erable insight and potential for future applications, great care must be 
exercised in the solution of the nonlinear programs. Our experience 
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indicates that the computer code, the parameter settings, the starting 
points, and the scaling of the variables must be chosen judiciously in 
order to obtain useful results. 
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On Approximations for Queues, III: Mixtures of 
Exponential Distributions 
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To evaluate queueing approximations based on a few parameters (e.g., the 
first two moments) of the interarrival-time and service-time distributions, we 
examine the set of all possible values of the mean queue length given this 
partial information. In general, the range of possible values given such partial 
information can be large, but if in addition shape constraints are imposed on 
the distributions, then the range can be significantly reduced. The effect of 
shape constraints on the interarrival-time distribution in a GI/M/1 queue was 
investigated in Part II (see “On Approximations for Queues, IJ: Shape Con- 
straints,” this issue) by restricting attention to discrete probability distribu- 
tions with probability on a fixed finite set of points and then solving nonlinear 
programs. In this paper we show how one kind of shape constraint—assuming 
that the distribution is a mixture of exponential distributions—can be exam- 
ined analytically. By considering GI/G/1 queues in which both the interarrival- 
time and service-time distributions are mixtures of exponential distributions 
with specified first two moments, we show that additional information about 
the distributions is more important for the interarrival time than for the 
service time. 


I. INTRODUCTION AND SUMMARY 


Many approximations for the mean steady-state queue length in the 
GI/G/1 queue are based on the first two moments of the general 
interarrival-time and service-time distributions. To evaluate these 
approximations, it is natural to compare the approximations with the 
set of possible values of the mean queue length given this limited 
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moment information. For several special cases, the minimum and 
maximum values of the mean queue length are attained by simple 
two-point extremal distributions. In Part I the extremal distributions 
were used to calculate the extreme values of the mean queue length in 
the GI/M/1 queue and show how they depend on the traffic intensity, 
the second moment of the interarrival-time distribution, and an upper 
bound on the distribution.’ Extremal distributions were also used to 
compare different parameters for approximations. 

Unfortunately, the range of possible values of the mean queue length 
in the GI/M/1 queue given this limited moment information can be 
very wide. However, since the extremal interarrival-time distributions 
are quite unusual, this still leaves the possibility that the range would 
not be too wide for typical distributions. Part I] showed that the range 
of possible values for the mean queue length in the GI/M/1 
queue can indeed be reduced dramatically by imposing shape con- 
straints such as unimodality and log-convexity on the interarrival- 
time distributions with given first two moments.” This was done by 
restricting attention to discrete distributions with all mass on a fixed 
finite set of points and solving nonlinear programs. 

Unlike Part I, the approach in Part II was computational, based on 
nonlinear programs. However, the extremal distributions obtained 
from the nonlinear programs exhibit regularity that suggests the 
possibility of an analytic treatment similar to Part I. This paper sets 
out to treat analytically one kind of shape constraint. We show that 
the theory underlying Part I also applies to mixtures of exponential 
distributions. Within this class of distributions there are extremal 
distributions with respect to the same partial orderings based on 
Laplace transforms used in Part I. The extremal distributions in this 
class of mixtures are obtained by using the extremal distributions of 
Part I as the mixing distributions. These extremal distributions yield 
the minimum and maximum mean queue length as interarrival-time 
distributions in the GI/M/I queue and as service-time distributions in 
the K2/G/1 queue (with interarrival-time distributions having a ra- 
tional Laplace-Stieltjes transform with a denominator of degree 2, see 
Section V of Part I and Section VII here). 

The rest of this paper is organized as follows. In Section II, we 
briefly review the theory yielding distributions that minimize or max- 
imize the Laplace-Stieltjes transforms for all arguments. In Section 
III, we show that this theory applies to mixtures of exponential 
distributions, and in Section IV, we apply the results to the H/M/1 
queue, having interarrival-time distributions that are mixtures of 
exponential distributions. In Section V, we examine the case of He 
interarrival-time distributions (mixtures of two exponential distribu- 
tions) in more detail. In Section VI, we indicate how some of the 
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results for H/M/1 queues extend to GI/G/1 queues with interarrival- 
time having increasing mean residual life (the service-time distribution 
is general instead of exponential and the interarrival-time distribution 
can be more general than a mixture of exponentials). Finally, in 
Section VII, we indicate how the ordering of transforms can be applied 
to compare different service-time distributions in K2/H/1 queues. 
There, Table III gives a good picture of the way the mean queue length 
can vary in a large class of GI/G/1 queues with the first two moments 
of the interarrival time and the service time specified. 


Il. EXTREME VALUES OF THE LAPLACE-STIELTJES TRANSFORM 


As in Eckberg® and references there, we obtain the extremal distri- 
butions for queues with specified moments for the interarrival times 
and service times from extremal distributions for the Laplace-Stieltjes 
transform. For the transform, the object is to find a cdf (cumulative 
distribution function) F with support on the interval [0, bm], b < ~™, 
to minimize or maximize the transform ¢(s), defined by 


o(s) = f e“dF(t), s20, (1) 


subject to moment constraints 


m; = i} vdF(t), (2) 


for j = 1, 2, --- ,n. The key idea is to apply the theory of Tchebycheff 
systems in Karlin and Studden,* which implies that the optimization 
problem involving (1) and (2) has a very nice solution. First, the 
minimizing and maximizing cdf’s are independent of the variable s in 
the transform ¢(s). Second, the extremal distributions are discrete 
distributions with positive mass on at most (n + 2)/2 mass points. 
Finally, the points with positive mass and the associated probability 
masses are obtained simply by solving a system of linear equations. 
(See Section 2 of Eckberg® and Section II of Part I’ for more discus- 
sion.) 


III. MIXTURES OF EXPONENTIAL DISTRIBUTIONS 


Now we consider the optimization problem in Section II for distri- 
butions that are mixtures of exponential distributions. It turns out 
that the theory of Tchebycheff systems can be applied again because 
the extremal distributions in this class of mixtures can be obtained by 
using extremal mixing distributions. 

A cdf F is a mixture of exponential distributions if it satisfies 
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1 — F(x) = f e*/'dG(t), x«=0, (3) 


for some mixing cdf G. Densities of mixtures of exponential distribu- 
tions are also called completely monotone; see Section 5.4 of Keilson.° 
A density f has this property if and only if it has derivatives f of all 
orders n and (—1)"f(x) = 0 for all x and n. Mixtures of exponentials 
are log-convex (see Part II) and thus are DFR (have decreasing failure 
rate). 

It turns out that the moments and transform of F are easily 
expressed via G: 


m(F) = i t*dF(t) = k! i, t*dG(t) = k!m,(G) (4) 


and 


p(s) -{ e “dF(x) - { (1 + st)“"dG(t). (5) 


Moreover, the functions 1, t, 2t?, --- , (R!)t*, (1 + st)! form a complete 
Tchebycheff system, so extremal distributions F within the class of 
mixtures are obtained by using the associated extremal mixing cdf’s 
G. If the first n moments of F are specified as m,, me, --- , Mn, then 
the first n moments of G are m, m2/2, --- , m,/n! 

First suppose that the two moments of F are specified as m; and 
Mg. Let c? be the squared coefficient of variation of F, i.e., c? = (mz — 
m‘)/m}. Also require that the mixing cdf G has support on the interval 
[0, bm,], b < 0. Then the extremal distributions are: 

1. Upper bound—the two-point mixture with mass (c” — 1)/(c? + 
1) on 0 and mass 2/(c? + 1) on the exponential distribution with mean 
m,(1 + c”)/2, which has cdf 


E(x) = 1 — [2/(1 + e?)Je?/™), x > 0, (6) 


and 
2. Lower bound—the mixture of two exponential distributions, one 
having mean bm, with probability (c? — 1)/(c? — 1 + 2(b — 1)”) and 
the other having mean m,[1 — (c? — 1)/2(6 — 1)] with probability 
2(b — 1)?/(c? — 1 + 2(b — 1)”); the cdf is 
FAx) = 1 - [e? — 1 + 2(b — 1)*-4{(c? — 1)eo™ 
+2(6-— 1)e te = BRO HD): ese, 


(7) 


As b — &, the lower bound approaches (converges in law) to 
3. Limiting lower bound—the exponential distribution with mean 
m, having cdf F7(x) =1—e°"", x= 0. 
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The upper bound cdf F,, may not be considered a mixture of expo- 
nential distributions because of the atom at 0, but the atom at 0 can 
be thought of as an exponential distribution having mean 0. Alterna- 
tively, F, can be realized as the limit in distribution of mixtures of 
two exponential distributions having means \j’ and Az” and proper 
moments where \j! > 0 and Az! > m,(1 + c”)/2. 

Let ¢7(s), ¢7(s), and ¢,(s) be the transforms of the extremal cdf’s 
F7, Fv, and F,, respectively. The theory of Tchebycheff systems 
implies that 


oz(s) = o2(s) = G(s) = g.(s) (8) 


for all s and the transforms ¢ of cdf’s F of the form (3) having first 
two moments m, and mp. 

Remark: It is no doubt possible to study extremal distributions for 
‘other kinds of mixtures, but we have not. Mixtures of exponentials 
seem particularly appropriate for the queueing application. 


IV. THE H/M/1 QUEUE 


The results of Section III apply immediately to GI/M/1 queues in 
which the interarrival-time distribution is a mixture of exponential 
distributions; see Section II of Part I. Since the mixture of k exponen- 
tial distributions is called hyperexponential and is denoted by H;, we 
use H to refer to interarrival-time distributions that are general 
mixtures of exponentials. 

Note that the upper bound cdf F, in (6) as an interarrival-time 
distribution corresponds to a batch Poisson arrival process with geo- 
metrically distributed batches having mean mg = (1 + c?)/2 and 
squared coefficient of variation ch = (mg — 1)/mp. Let M® represent 
a batch Poisson arrival process. Of course, the limiting lower bound 
corresponds to a Poisson arrival process with intensity 1/m,. What 
we obtain is the ordering 


M/M/1 < H/M/1 s M®/M/1, (9) 


which means that the mean queue lengths (expected number in the 
system, including any in service) are ordered and in fact the entire 
steady-state queue-length distributions are stochastically ordered as 
in (9), provided the traffic intensity p and the squared coefficient of 
variation of the interarrival-time distribution, c”, are fixed. We obtain 
these orderings because in the case of exponential service-time distri- 
butions the entire steady-state queue-length distribution depends only 
on the traffic intensity p, which is fixed, and the root o in the interval 
(0, 1) of the equation 


é[u(1 — o)] =o. (10) 
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It is easy to see that the queue-length distributions P(Q; < k) are 
stochastically ordered, i.e., 


P(Q, 2 k) = P(QQ2 =k) forall R=O (11) 


if the roots satisfy o; < a2. Moreover, it is easy to see that the roots 
are ordered if the transforms are ordered in the sense (8). 

Let oc, and L, be the probability of delay and mean queue length in 
the H/M/1 queue with interarrival-time distribution F,, and similarly 
for Fv and F7. Here are the main results: 

Theorem 1: For an H/M/1 queue with traffic intensity p and interar- 
rival-time squared coefficient of variation c?, 


ov=p and o,=1-2(1-p)/(1 +c”), (12) 
so that 
Le=pl1—p), Le = Lal. + %)/2 (13) 
and the maximum relative error (MRE) is 


MRE = (L, — LA/L? = (6, — oA)/(1 — 9.) = (c? — 19/2. (14) 


Proof: Since o = p for an M/M/1 queue, o7 = p. For a,, follow the 
proof of Theorem 2 in Part I, making the change of variables (1 — 
ov) = (1 — o,)(1 + c?)/2. 

From Corollary 1 of Part I and Theorem 1, we see that the shape 
constraint reduces the maximum relative error from c? to (c? — 1)/2. 
If c? is near its lower limit 1 for mixtures of exponentials, then of 
course the MRE is very small. 

Given the first two moments, the upper bound is hard and the lower 

bound is soft: The upper bound depends on c?; the lower bound does 
not. The upper bound is not improved by specifying the third moment; 
the lower bound is. From Section IV of Part I, we see that the extremal 
distributions given three moments are two-point mixtures of exponen- 
tials: 
Theorem 2: For H/M/1 queues, specifying the third moment of the 
interarrival-time distribution in addition to the first two does not change 
the upper bound cdf F,, and makes the lower bound cdf F7the unique 
Hp distribution (two-point mixing distribution) specified by these three 
parameters. 

The formula for calculating He parameters given the first three 
moments is given in (3.5) and (3.6) of Ref. 6. 


Example 1: Consider an interarrival-time distribution with moments 
m, = 2.00, mz = 12.00, and m3 = 119.01, which are the moments of 
Prototype Distribution I in Part II. With mixtures of exponential 
distributions, the upper bond cdf is 
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F(x) = 1 — 0.6667e7 "3", x= 0, 
and the lower bound cdf is 
FA(x) = 1 — 0.5146e7°796** + 0.4854e7 18985 x= 0. 


From Theorem 1, given just the first two moments, o7 = 0.6667 and 
a, = 0.7778 for p = 0.6667 and ov = 0.9000 and a,, = 0.9333 for p = 
0.9000. From Theorem 2, also specifying the third moment changes 
the lower bound to ov = 0.76705 for p = 0.6667 and o7 = 0.93259 for 
p = 0.9000. To get these, we solved the appropriate H2/M/1 queue. 
Imposing the shape constraint in addition to the first two moments 
reduced the MRE from c? = 2.0 to (c” — 1)/2 = 0.50. Also specifying 
the third moment further reduces the MRE to 0.048 when p = 2/3 and 
0.011 when p = 9/10. 


V. MIXTURES OF TWO EXPONENTIALS: H, DISTRIBUTIONS 


Mixtures of two exponential distributions, i.e., H, distributions, play 
a key role in many approximations. This is a three-parameter distri- 
bution with density 


f(x) = prrre™™* + podrse™*, x > 0, (15) 


where p2 = 1 — p;. Instead of the three parameters p,, \;, and i», one 
may choose to work with the first three moments m, m2, and m3 or 
the mean my, the squared coefficient of variation c’, and the proportion 
of the total mean in the component with the smaller mean r, defined 
by 


_ Di/Ai 
OS My (piao ” 


where \; > Az. Given the parameters pi, A1, and Xo, it is easy to calculate 
any of the other parameters. The formulas for pj, \;, and Ag given the 
first three moments appear in (3.5) and (3.6) of Ref. 6. Given my, c?, 
and r, mz = mi(c? + 1), pr = rm, Ao = (1 — rm4d,)/(1 — r)m, and 


\1 = (-B + VB? — 4AC)/2A, (17) 


where A = rmym2/2, —B = (m2/2) + (rm;)? — (1 — r)?mi, and C = rm. 

For two-moment approximations based on Hg, distribution, one of 
the three parameters is often eliminated by setting r = 1/2; see Section 
3.1 of Ref. 6. The range of all possible values given the first two 
moments is indicated in Section IV since both the upper and lower 
bounds are Hy, distributions. Since this range is pretty wide, it is 
natural to ask how the distribution and the GI/M/1 queue character- 
istics vary with the third parameter—either r or m3. For what values 
of r is the approximation by r = 1/2 reasonable? 


QUEUEING APPROXIMATION—IH] 169 


In order to answer this question, we have calculated the third 
moment m3 and the queue characteristics o and L for two values of c? 
(2 and 12), three values of p (0.3, 0.7, and 0.9), and thirteen values of 
r (ranging from 0.001 to 0.999). The results appear in Tables I and II. 

For c? = 2.0, the approximation by r = 1/2 appears quite robust. 
For r in the interval [0.2, 0.8], the maximum relative error is 15.8 


Table I—The possible third parameters and queue characteristics for 
an H2/M/1 queue given c? = 2.0 with p = 0.3, 0.7, and 0.9 


Proportion of Key Root, Probability 

Total Mean in Skewness, of Delay o Mean Queue Length, L 
Component Third 

With Smaller Moment 





Mean, r m3/mi  p=0.3 p=0.7 p=09 p=03 p=0.7 p=09 
Upper bound 13.5 0.5333 0.8000 0.9333 0.643 3.500 13.500 
0.001 135 0.5323 0.7999 0.9333 0.641 3.499 13.499 
0.01 13.6 0.5230 0.7992 0.9333 0.629 3.486 13.486 
0.10 14.6 0.4627 0.7933 0.9327 0.558 3.386 13.381 
0.20 15.4 0.4280 0.7885 0.9323 0.525 3.309 13.291 
0.30 16.2 0.4059 0.7842 0.9319 0.505 3.244 18.210 
0.40 17.1 0.3894 0.7801 0.9314 0.491 3.183 13.127 
0.50 18.0 0.3757 0.7757 0.9310 0.481 3.121 13.036 
0.60 19.2 0.3633 0.7707 0.9304 0.471 3.053 12.924 
0.70 20.9 0.3512 0.7643 0.9295 0.462 2.970 12.771 
0.80 23.9 0.3382 0.7552 0.9281 0.453 2.860 12.522 
0.90 32.1 0.3226 0.7394 0.9248 0.448 2.686 11.966 
0.99 167.9 0.3029 0.7065 0.9074 0.480 2.385 9.715 


0.999 1518.0 0.3003 0.7007 0.9009 0.429 2.339 9.080 
Lower bound 00 0.3000 0.7000 0.9000 0.429 2.333 9.000 


Table I!—The possible third parameters and queue characteristics 
for an H2/M/1 queue given c? = 12.0 with p = 0.3, 0.7, and 0.9 


Proportion of 
Total Mean in Skewness, _— Key Root, Probability 





Component Third of Delay o Mean Queue Length, L 
With Smaller Moment 
Mean, r m3/mi  p=0.3 p=0.7 p=09 p=03 p=0.7 p=09 
Upper bound 253.5 0.8923 0.9539 0.9846 2.789 15.17 58.50 
0.001 253.8 0.8921 0.95388 0.9846 2.779 15.16 58.49 
0.01 256.1 0.8897 0.9536 0.9846 2.721 15.10 58.43 
0.10 280.7 0.8590 0.9516 0.9844 2.128 14.48 57.80 
0.20 312.6 0.8006 0.9488 0.9842 1.505 13.68 56.99 
0.30 351.6 0.7114 0.9451 0.9839 1.040 12.74 56.01 
0.40 401.2 0.6142 0.93896 0.9836 0.778 11.59 54.76 
0.50 468.0 0.5311 0.9311 0.9831 0.640 10.16 53.10 
0.60 565.1 0.4650 0.9163 0.9823 0.561 8.36 50.73 
0.70 722.7 0.4120 0.8881 0.9808 0.510 6.25 47.00 
0.80 1031.7 0.8686 0.8380 0.9776 0.475 4,32 40.20 
0.90 1946.0 0.8319 0.7708 0.9646 0.449 3.05 25.39 
0.99 18,287. 0.3030 0.7070 0.9089 0.430 2.39 9.88 


0.999 181,638. 0.3003 0.7007 0.9009 0.429 2.34 9.08 
Lower bound o 0.3000 0.7000 0.9000 0.429 2.33 9.00 
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percent, 15.7 percent, and 6.1 percent for p = 0.3, 0.7, and 0.9. Very 
large values of r greatly extend the range. 

On the other hand, for very large values of c? such as 12, the 

approximation by r = 1/2 is not robust: two moments do not pin down 
the He» distribution well. Using r = 1/2 as an approximation works 
better as p increases and c” decreases. Of course, by Theorem 1, p 
plays no role in the MRE over all r, but if we bound r, then p plays a 
role. We interpret these results as providing support for H, approxi- 
mation based on r = 1/2, but large values of m2 or m3 are clear danger 
signals. 
Example 2: Example 1 was based on Prototype Distribution I from 
Part II. Since Prototype I is a discrete probability mass function it is 
not a mixture of exponential distributions, and is thus not entirely 
satisfactory. Suppose we use the H2 density with balanced means (r = 
0.5) as a prototype instead. With m, = 1 and c? = 2.0, the prototype 
H, density is 


f(x) = pide + podrge™*, x2 0, 
where 
D1 = 0.78867, \; = 1.577, and Az = 0.42265. 


Given the first two moments with p = 0.7 and 0.9, a,, can be obtained 
from Table II of Part I and o7can be obtained from Theorem 2 there. 
The values are o7 = 0.466 and ao; = 0.822 for p = 0.7 and o7 = 0.808 
and o; = 0.936 for p = 0.9. The corresponding extremal characteristics 
among Hy, densities are 7 = 0.700 and o,, = 0.800 for p = 0.7 and o7 = 
0.900 and g,, = 0.933 for p = 0.9. 

The third moment 18.0 (see Table I) pins down the Hg distribution, 
but among all H densities it is a lower bound. Among H densities with 
m3 = 18.0, o4= 0.7757 for p = 0.7 and a7 = 0.9310 for p = 0.9. The 
MRE given only two moments is 200 percent for p = 0.7 and 0.9. 
Working with mixtures of exponentials reduces the MRE to 50 per- 
cent. Specifying the third moment too reduces the MRE to 12 percent 
for p = 0.7 and 3 percent for p = 0.9. 


VI. THE H/G/1 QUEUE 


The assumption of exponential service-time distributions played a 
crucial role in Section IV. With exponential service-time distributions, 
the mean queue length L depends on the transform of the interarrival- 
time distribution, so that we can apply the ordering in (8). However, 
it turns out that the ordering in (9) also applies for the mean queue 
length with general service-time distributions, i.e., we have 


M/G/1 <= H/G/1 < M®/G/1, (18) 
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by which we mean that L7< L <= L; (but not the more general 
stochastic order) for all systems with common service-time distribu- 
tion and given first two moments of the interarrival-time distribution. 

To obtain (18), it suffices to observe that known formulas for L in 
the M®/G/1 and M/G/1 systems agree with previously established 
lower and upper bounds for L in GI/G/1 queues having interarrival- 
time distributions with increasing mean residual life and with the first 
two moments of the interarrival times and service times specified. 
(See Ref. 7 for more details.) This result dramatically demonstrates 
that these papers have applicability beyond the special case of the 
GI/M/1 model. 


Vil. THE K2/H/1 QUEUE 


Whenever the interarrival-time distribution or the service-time 
distribution in a GI/G/1 queue has a Laplace-Stieltjes transform that 
is a rational function, then the steady-state distribution can be char- 
acterized in terms of the roots of an equation involving the transforms 
of the interarrival-time and service-time distributions; see II.5.10,11 
of Cohen.’ When .the interarrival-time distribution has a rational 
transform with a denominator of degree 2, denoted by Ke, the mean 
queue length and the probability of delay depend on the service-time 
distribution only through its first two moments and a single root of 
an equation involving the transforms of the interarrival-time and 
service-time distributions; see p. 330 of Cohen,® Section V of Part I, 
and Ref. 9. 

Hence, for K2/G/1 queues it is possible to find extremal service- 
time distributions using the ordering of transforms in (8). Let GE, 
denote the convolution of two exponential distributions (an Erlang, 
Ee, is a special case), which is Kz. An Hg, distribution is also Kg. 
Paralleling Section V of Part I, we obtain from the analysis in Ref. 9 
that 


GE./M8/1 < GE./H/1 < GE,/M/1 (19) 
and 
H2/M/1 < H2/H/1 < Ho/M®/1, (20) 


by which we mean that the mean queue lengths are ordered as 
indicated. A significant feature of (19) and (20) is that the maximizing 
distributions are different for the different K, interarrival-time distri- 
butions. (This is explained in Ref. 9.) By M, we mean the extremal 
service-time distribution F7 for large b. As b — ©, the distribution 
approaches the exponential distribution, but the fixed variance of 
F7 is lost in the limit. As b — ©, the key root in the equation for the 
K2/G/1 queue approaches the root for the K,/M/1 queue, but the 
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mean queue length also depends on the variance of Fv. The mean 
queue length in the K,/M/1 system is the limit as b + © of the mean 
queue length in the K2/G/1 system with service-time distributions 
Fy. This limiting mean queue length can be computed by using the 
fixed Bervice- Une variance together with the root for the K2/M/1 
system.®? 

As in Section V, if we specify three service-time moments instead 
of two, the M® bound is unchanged, but the M bound is replaced by 
the Hz distribution uniquely determined by the three moments, i.e., 
with the interarrival-time distribution and three moments of the 
service time specified, we get 


GE,/M®/1 < GE./H/1 < GE,/H./1 (21) 
and 
H./H2/1 = H2/H/1 s H2/M®/1. (22) 


We conclude by exhibiting the mean queue length, L, for several 
K2/H,/1 queues. We consider five different Hz service-time distribu- 
tions with a common mean 0.7 and a common squared coefficient of 
variation c2 = 2.0. (We use subscripts “s” and “a” to indicate that 
parameters are associated with the service-time distribution or the 
interarrival-time distribution.) As in Section V, the Hg, distributions 
are characterized by the parameter r,. We consider distributions close 
to the two extremal distributions M® (r, = 0.01) and M (r, = 0.99), as 
well as the intermediate values r, = 0.1, 0.5, and 0.9. The case r, = 1.0 
differs from the exponential distribution because the small mass at a 
large value, necessary to have c” = 2.0 instead of 1.0, still has an effect. 
(This is not the case for the H, interarrival-time distributions.) 

We consider six interarrival-time distributions: the same five He 
distributions and the Erlang (E,) distribution. All the interarrival- 
time distributions have mean 1.0, so that the traffic intensity is always 
p = 0.7. As with the service-time distributions, the Hy, interarrival- 
time distributions have squared coefficient of variation c3 = 2.0. 

The results for the 30 cases are displayed in Table III. For the 
extremal H interarrival-time distributions, M® and M, the mean queue 
length, L, does not depend on r, because L depends on the service- 
time distribution only through its first two moments.’ The range of L 
values over r, increases for H, interarrival-time distributions as r, 
moves away from the endpoints 0.0 and 1.0. The range is bigger for c3 
= 2.0 (H2) than for c2 = 0.5 (E2) when r, = 0.5, but obviously not for 
all r. 

Table III gives an indication of the quality of two-moment approx- 
imations for GI/G/1 queues when c2 = c2 = 2.0 and p = 0.7. A natural 
two-moment approximation would be based on the H2/H2/1 queue 
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Table III—The mean queue length, L, in several K2/H2/1 systems 
with traffic intensity p = 0.7 


Service-Time Distribution 





(MB) Hyperexponential (H.) (M) 
r, = 0.01 r, = 0.1 r, = 0.5 r, = 0.9 r, = 0.99 

2.61 2.62 2.63 2.63 2.63 

al= 3.15 3.15 3.15 3.15 3.15 

S 

> 

ala 

$a 3.60 3.60 3.59 3.56 3.52 

bake 

£/e 

| 8 4,02 4.01 3.99 3.96 3.94 

2| & 

n| 2 

Siw 

8 ce; 4,23 4.23 4.21 4.21 4.20 
4,32 4,32 4,32 4.32 4.32 


(M?®) 





Notes: 1. The hyperexponential (Hz) distributions all have squared coefficient of 
variation c* = 2.00. 2. The Erlang (E,) distribution has squared coefficient of variation 
c? = 0.5. 3. The M service-time distribution differs from an exponential distribution 
because of the small mass at a very large value. This causes the H2/M/1 values of L to 
differ from the H,/M/1 values of L in Table I. 


with c?2 = c2 = 2.0 and r, = r, = 0.5. The range of H2/H2/1 values as 
r, and/or r, varies indicates the possible deviations from the approxi- 
mations when the distributions are required to be mixtures of expo- 
nential distributions. The maximum relative error is (4.32-3.15)/3.15 
or 37 percent, but would be much less if we restricted r, and r, to some 
reasonable interval, e.g., [0.2, 0.8]. 

Table III enables us to compare the effect of additional information 
about the interarrival-time and service-time distributions. Table III 
shows that, given two moments, other properties of the distribution 
are much more important for the interarrival-time distribution than 
for the service-time distribution in determining the mean queue length. 
This phenomenon was previously noted by Sahin and Perrakis.’® 

The program for calculating the mean queue length and the proba- 
bility of delay in a K2/G/1 queue used to obtain Table III is being used 
as part of a three-parameter procedure for approximating general 
G/G/1 queues with bursty, possibly nonrenewal arrival processes." 
The general bursty arrival process is approximated by a renewal 
process with an H, interarrival-time distribution, which is character- 
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ized completely by the first three moments of the renewal interval.® 
Then the expected queue length and probability of delay are calculated 
exactly for the resulting H:/G/1 model. Additional descriptions of the 
H2/G/1 queue, such as an entire waiting-time distribution, are ob- 
tained using approximations similar to the ones in the software pack- 
age QNA (see Section 5.1 of Ref. 12). This approach is part of a new 
three-parameter algorithm for QNA. 
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Computing Inductive Noise of Chip Packages 
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Inductive noise limits the physical design of high-speed, high pin-out chip 
packages. This paper presents the derivation of some basic equations that are 
useful for computing inductive noise of various chip packages, and also 
presents simple asymptotic and limiting results that reduce to some useful 
approximate results proposed by others. These results are helpful for comput- 
ing inductive noise in arrays of wire bonds, solder balls, dual in-line package 
leads, package pins, and connector pins. Computed results agreed well with 
measured results. We present two simple rules for minimizing inductive noise 
and also discuss the inductive noise of power and ground planes. 


I. INTRODUCTION 


If n drivers each switch current at [ = 20 mA/ns, the inductive noise 
across a common ground lead inductance, L,, is approximately nL,I. 
For a 32-bit processor and L, = 1 nh, this inductive noise component 
is about (32)(1 nh) 20mA/ns = 640 mV. It is known that a 50-mil- 
long wire bond used as an input/output (I/O) lead of an integrated 
circuit chip has a self-inductance of about 1 nh. Thus, many such 
leads must be connected in parallel to reduce this inductive noise 
component to tolerable levels. This is necessary because present large- 
scale integrated (LSI) circuits have a total noise margin of only a few 
hundred millivolts. This inductive noise problem has been recognized 
and discussed by C. W. Deisch.’ 

This paper derives some general equations that are useful for com- 
puting the inductive noise of high-speed, high pin-out chip packages. 
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A basic role is played by the mutual inductance between two parallel 
conductors. 


Il. MUTUAL INDUCTANCE OF TWO PARALLEL CONDUCTORS 


Consider the two parallel conductors shown in Fig. 1. By applying 
the Biot-Savart law,” a current I flowing in the y direction produces a 
magnetic flux density, B(x, y), given by 


ee _ al ee (1) 
-~42 7° ae 4g J_42 r° ze 





I 
B(x, y) = ra 


al fy +4) (y — 42) 
~ 4 D} 2 2 2)? (2) 
TL IN(y + AJ2)y+x°  V(y— G2) +x 


where » = permeability of the medium. The total flux, A, linking the 
idle conductor is then given by 


foe) L]2 
A= { dx | B(x, y)dy 
d —L/2 
_ ule] (2 A“\ d\' ,@ 
= ln (4 1 +(4)) 1+(2) +3} (3) 


If the medium is a vacuum or air, then p = po = (47)107" h/m and the 
mutual inductance, M, is given by 


2. ep 


r2 = x2 + (y-yq)2 


x =r sin 





|) IDLE CONDUCTOR 
do U (0, Yo) ° 


Fig. 1—Coordinate system for derivation of mutual inductance. 
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where 
Ho/2a = 5 nh/in 
¢ = length in inches 
d = separation in inches. 
A useful asymptotic result for small d/Z is given by 


22 d d\’ 
a ee (eee eos fae h. 
M sen (24) 1+7 (4) |= (5) 
Equation (4) can also be derived by evaluating the Neumann induct- 


ance integral. Equations (4) and (5) agree with eqs. (1) and (3) of Reef. 
a 

The inductances discussed in this paper are more precisely known 
as partial self and mutual inductances. However, we shall follow Ref. 
3 and refer to them as merely self and mutual inductances. 


Il. SELF-INDUCTANCE OF A STRAIGHT CONDUCTOR 
3.1 Self-inductance resulting from the internal field 


Consider the current element shown in Fig. 2. A basic definition of 
self-inductance, L, is 


No 


= TT = total number of flux linkage per ampere. (6) 


I= CURRENT 
p= RADIUS 


rp 


te — — — — ow > 





Fig. 2—Notation for derivation of self-inductance. 
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From Maxwell’s equation? (i.e., Ampere’s law), the magnetic intensity, 
H,, internal to the conductor is given by 


2 
— Ha (4 5) rt) = 1(5] O=<=r<p. (7) 


Equation (7) assumes that the current density, I/(zp), is uniform in 
the conductor. (Skin effect is neglected. References 2, 3, and 4 show 
that skin effect tends to reduce the internal self-inductance, L;.) From 
eq. (7), 


iy I : 
H, = || = +(4 O<r<p. (8) 
Qur 
The flux density, B,, internal to the conductor is then 
I 2 
By = ult, = 2 (2) Osr<p. (9) 
2ar 


As Ref. 4 shows, a given flux line of radius r S p encloses a fraction 
(r/p)? of the total current J. Thus, from eqs. (6) and (9) the self- 
inductance, L;, resulting from the internal magnetic field is 


V4 i, ep r\? ul 
= = B a d =. 
: eA eet (-} 7 8x 20) 
Equation (10) can also be derived from internal energy considerations. 
If u = wo, and Z is in inches, 


L; = (22) (2) =5 (2) nh. (11) 


3.2 Total self-inductance 


The total self-inductance, L,, of a straight conductor is obtained by 
adding the contributions from the external and internal magnetic 
fields. Thus, from eqs. (4) and (11) we have 


L, = Mla-p + Li = Mlezp + 5 (- “\ nh, (12) 


L, ~ 52 in (2) - ; nh. (13) 
p 4 


Equation (13) agrees with eq. (7) of Ref. 3. 
When the cross section of the conductor is rectangular, Ref. 3 shows 
that the self-inductance is given by 


Also, for small p/Z 
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| a 6 In (4) + ; nh, (14) 
Dp 2 


where p = perimeter of cross section in inches. 


IV. INDUCTANCE OF POWER AND GROUND PLANES 


Consider the power and ground (P/G) planes shown in Fig. 3. 
Assume that both planes carry equal, thin sheets of current, J, in 
opposite directions. Again, using Ampere’s law, 


p HG=1 (15) 


The magnetic field is more intense and approximately uniform in the 
space between the P/G planes. The magnetic field outside the planes 
is assumed to be negligible because of field cancellation. Thus, 


and 


|\H|W= HW =I, (16) 
I 
B= wH=7, (17) 
No Beh wh 
i ae (18) 
e_|— — — — Wh ET ET 






POWER PLANE 
h< <W 


h<<g 


GROUND PLANE 


Fig. 3—Notation for derivation of inductance of P/G planes. 
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If 4 = po; and Z is in inches, 


L= (22) Onl (=) = 1007 (+) nh. (19) 


One half of this L is associated with the ground plane and the other 
half is associated with the power plane. Thus, the inductance of the 
ground plane, L,, and the inductance of the power plane, L,, are given 
by 


h 
L, = Lp = ine(t) nh. (20) 


V. COMPUTING INDUCTIVE NOISE 
5.1 Pair of conductors 


Consider the pair of conductors shown in Fig. 4. The noise voltage, 
Un, is a result of the self and mutual inductances of the conductors. 
Thus, using eqs. (12) and (4), 


vz = Ll — MI = (L, — M)I mV, (21) 


where I = dI/dt = time rate of change of current, mA/ns. 
For small d/Z eqs. (5), (13), and (21) yield the asymptotic result 


d\ 1 (d d\’ 
Vz ~ 541 [in (2) +2-()+(S) ]mv, (22) 


where 4 d, p are expressed in inches and I is expressed in mA/ns. As 
d/Z — 0, eq. (22) agrees with eq. (6-26) of Ref. 4 and eq. (16) of Ref. 
3. Also, the first term of eq. (22), or 


dl 
dt 


p = RADIUS 


d = SEPARATION 


Up = INDUCTIVE NOISE 
VOLTAGE 





a —-— - EE EE 


See 


Fig. 4— Notation for derivation of inductive noise voltage, un. 
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Un = 5ZIIn (£) mV (23) 
was proposed as an approximation in the work reported in Ref. 1. 

Table I shows some numerical values of inductive noise voltages for 
a pair of conductors such that J = 20 mA/ns, 7 = 0.2 inches, p = 0.01 
inches, and d = 0.1, 0.2, 0.3, and 1.0 inch. These numerical results 
show that the “exact” inductive noise voltage is somewhat less than 
that given by the approximate eq. (23). However, for small d/Z the 
results obtained from eqs. (23) or (22) are suitable. 

In general, the duration of the inductive noise voltages is approxi- 
mately equal to the signal rise time. 


5.2 Array of conductors 
5.2.1 General equations 


Consider an array of N + 1 conductors having equal lengths, 4 equal 
radii, p, and separations, d;, as shown in Fig. 5. Let us compute the 
inductive noise voltage, v,, induced in a particular conductor located 


Table |—Inductive noise voltages for a pair of conductors 


I= 0.2 in., p=0.01 in., I = 20 mA/ns, 
Approximate Exact Asymptotic 
d d/I (eq. 23) (eq. 21) (eq. 22) 
0.1 in. 0.5 46.0 mV 43.25 mV 42.3 mV 
0.2 1.0 60.0 50.4 49.9 
0.3 1.5 68.0 53.3 54.3 
1.0 5 92.1 57.8 122.1 


8 = LENGTH 


ag 2p i p = RADIUS 
(-) dj = SEPARATION 


+= POSITIVE 





~= NEGATIVE f 


Fig. 5—Array of conductors. 
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at the center for convenience. Let us denote the rate of current change 
by /,(mA/ns), where i = 0, ---, N with i = 0 denoting the particular 
conductor of interest. 

One component of the inductive noise voltage, um, is a result of the 
mutual inductances (i.e., inductive crosstalk) and is given by 


N 
Um = > M,I; mV, (24) 


i=1 
where MM; is given by eq. (4) with 
M; = M |a=<;- 


The other component of the inductive noise voltage, us, is a result of 
the self-inductance of the particular conductor and is given by 


Us = Lae mV, (25) 


where L, is given by eq. (12). Thus, the total inductive noise voltage, 
Un, induced in the particular conductor is given by 


N 
Un = Us + Um = Lolo + ¥ Mil; mV. (26) 
i=1 


Equation (26) is the most general equation for computing the inductive 
noise of an array of conductors. 
If the J; in the array of conductors are constrained so that they 
satisfy the subsidiary condition (i.e., Kirchhoff’s current law), 
N 


% 1,=0, (27) 
i=0 
then eq. (26) can be written as 
N 
Un = —y (L, — Mi)I; mV. (28) 
i=l 


This equation is a generalization of eq. (21). Also, if eq. (27) holds, 
then for small d,/Z eqs. (5), (138), and (26) yield the asymptotic result 


Un ~ Of \-1 (ino - 1) 
2 
= 5 I; inca = (2) + (#) | mV, (29) 


where 4 d;, p are expressed in inches and the J’s are expressed in 
mA/ns. 

If all d;/7 — 0, eq. (29) reduces to eq. (6-27) in Ref. 4. If all 
d,/Z — 0, and |In(p)| >> 1/4, eq. (29) reduces to eq. (2) of Ref. 1, 
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namely, 
. N . 
be = 52 {lain -)> hinan| mV. (30) 
i=1 


If all conductor separations d; — ™, only the self-inductance of the 
conductor introduces inductive noise and eq. (26) reduces to 


Vp = L.JG. (31) 
In contrast, eqs. (29) and (30) are not applicable if any of the d; > ~. 


5.2.2 Grounded wire bonds 


As an example of computing inductive noise voltage across common 
ground leads in an array of conductors, consider the particular array 
of signal, power, and ground wire bonds on an integrated circuit chip 
shown in Fig. 6. Let us suppose each of the 32 signal bits switch 
current, simultaneously, at a rate of J mA/ns through the signal (S) 
wire bonds, as indicated in Fig. 6. What is the induced noise voltage, 
U,, across the common ground (G) leads? We shall assume that when 
the G leads switch, the P leads are idle. This is a property shared by 
many chip driver circuits. We shall also assume that the magnetic 
fields associated with wire bonds on different sides of the chip do not 
interact significantly. Finally, the chip driver circuits are assumed to 
be, approximately, uniformly loaded. 

From Kirchhoff’s current law, 


2h+1,+ 8f=0. (32) 
From eq. (26), the voltage v,, at the corner grounds is 
Uy = Eady + Im, + Im2 + TnMioa + I, Moa, (33) 


where L, is given by eq. (12), 
Mm, = Moy + Mg, + Min + Moa 
Mz = Mion + Misa + Mira + Misa, 


and M,, is given by eq. (4) with 


Mis = M \azia. 
Similarly, the voltage, ve, at the center grounds is 
vo = L,Ie + 2im, + 915M ion. (34) 


By equating v,; = v2, the common ground condition, and using eq. (32), 
there result two simultaneous equations: 


Ail + Bil = CI (35) 
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Fig. 6—Driver circuits on integrated circuit chip switching 32 signal bits simultane- 
ously, with v, = -115 mV. 


21, + I, = —8I, (36) 
where 
A, = L, + Moos — 2Mioa 
B, = Mya — Ls 
C, =m, — me. 
The solution of eqs. (35) and (36) is 
f_ Ci + 8B, 


I A, — 2B, 31) 
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I, —2[4A, + CG] 
IT A, 2B, ° ee 


From eq. (34), the inductive noise voltage, v,, across the grounded 
wire bonds becomes 


; . 
b= Sy = lt (2) + 2m, + 2 (2) Min| I mV. (39) 


If 7 = 0.1 inch, A = 0.02 inch, p = 0.0005 inch and I = 20mA/ns, the 
results are 
L, = 2.623 nh 


I, = —2.5771 mA/ns 
I, = —2.846I mA/ns 


m, = 1.1675 nh 
Myon = 0.1226 nh 
Un = —115.2 mV. (40) 


As an approximation, one can assume:a uniform distribution of 
return current rates and apply eq. (30). The results are 


i, = I, = -81/3 = —2.6671 mA/ns 
Vy —122.9 mV 
Dz = —91.91 mV. (41) 


The average of v;, Vv; and U2 is —112.57 mV, which is approximately 
equal to vu, of eq. (40). This averaging method was used in Ref. 1, and 
it can also be used with the exact eqs. (26) or (28) to obtain approxi- 
mate results. 

As a nonsymmetrical example, consider the particular array of wire 
bonds shown in Fig. 7. To simplify the analysis, we shall neglect the 
mutuals on the nonsignal sides of the chip. By eq. (27), we have 


2U3 + 21, + Ip] + 32I = 0. (42) 


Again, by using eqs. (26) one can write equations for v;, the voltage 
across the corner grounds, v2, the voltage across the center grounds, 


and 
I 
v3; = Lg (2). (43) 


By equating v, = v2 = U3, the common ground condition, and using eq. 
(42), there result three independent equations for J,, J2, and J3. By 
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Uy = -111 mV 
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Fig. 7—Driver circuits on integrated circuit chip switching 32 signal bits simultane- 
ously, with v, = —111 mV. 


solving these three simultaneous equations, we can determine J; and 
from eq. (48) we can determine the inductive noise voltage, v,, across 
the grounded wire bonds. If 7 = 0.1 inch, A = 0.02 inch, p = 0.0005 
inch, and [ = 20 mA/ns, the results are 


L, = 2.623 nh 
I, = —6.3681 mA/ns 
Vz, = —111.4 mV. (44) 


Thus, from the inductive noise point of view, the configurations shown 
in Figs. 6 and 7 are comparable. 
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In a similar manner, one can compute the inductive noise voltage 
across the common power or ground leads of an arbitrary array of 
conductors. 

In general, the inductive noise voltage, v,, is linear in the switching 
current rate, J. Thus, if J = 10 mA/ns, uv, as given by eqs. (40) and 
(44) would decrease by a factor of two. Accordingly, it is very important 
to keep J as small as is necessary for proper circuit operation. 

Also, eq. (89) shows that the self-inductance, L,, of the wire bonds 
is a major contributor to the inductive noise, v,;. Equation (12) shows 
that L, can be reduced by reducing the length, 4 of the wire bonds or 
increasing its radius, p. 

Notice that if all the mutual inductances were to vanish, the induc- 
tive noise voltages across the grounded wire bonds of Fig. 6 and 7 
would increase in magnitude to 

3(32/) L,(321) 
Un = N, = ono 140.0 mV, (45) 
where N, = number of chip grounds. 

Thus, in these cases, the mutual inductances serve to reduce the 
magnitude of inductive noise by about 20 percent. 


5.2.3 Minimization of inductive noise 


To help minimize the magnitude of inductive noise, two general 
rules are now apparent: 

1. Separate the P leads, and separate the G leads. Attempt to locate 
them symmetrically. This serves to minimize the buildup of flux 
linkages produced by current flow in the same direction and provides 
symmetry for the P/G leads. 

2. Locate the signal leads as close as possible to the P/G leads. This 
serves to reduce flux linkages produced by current flow in opposite 
directions. 

Similar rules were also given by C. W. Deisch.* 

These two rules were applied to determine the P/G/S lead assign- 
ments for Fig. 8, which shows min|v, S 110 mV. In contrast, the rules 
were violated drastically in Fig. 9 with the result that max|v,| = 196 
mV. By comparison with eq. (45), we see that the mutual inductances 
have now increased the magnitude of the inductive noise by at least 
40 percent. 

The two simple rules are useful to help minimize inductive noise 
resulting from general arrays of coupled conductors. 


5.2.4 Comparison with experiment 


Values of Uy, Un as given by eqs. (24) and (26) were found to agree 
well with experimental values of inductive noise voltage measured on 
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Fig. 8—Driver circuits on integrated circuit chip switching 32 signal bits simultane- 
ously, with min|v,| <= 110 mV. 


arrays of conductors in a 4 by 10 section of a circuit-pack connector.”® 
Details are presented in the Appendix. 


5.2.5 Some generalizations 


When the array of conductors contains nonparallel conductors, eq. 
(52) of Ref. 3 can be used to generalize eq. (4) above. Also, for more 
general configurations of parallel conductors, eq. (28) of Ref. 3 gener- 
alizes eq. (4) above. These generalizations, along with the associated 
self-inductances, can also be used in eq. (26) or (28) to compute 
inductive noise voltage. 
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Fig. 9—Driver circuits on integrated circuit chip switching 32 signal bits simultane- 
ously, with max|v,| = 196 mV. 


5.3 Power and ground planes 


The inductive noise voltage, v,, in a power or ground plane can be 
computed from eq. (20). If the time rate of change of the current 
flowing in the power and ground plane is Ip mA/ns, the inductive noise 
voltage in either the power or ground plane is given by 


Un = [pL = 5rloZ . mV, (46) 


where / is expressed in inches. For example, if 7 = 1 inch, W = 1 inch, 
h = 0.005 inch, and Jp = 200 mA/ns, then vz, = 15.7 mV. 
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VI. IMPEDANCE MATCH OF CHIP PACKAGES AND CIRCUIT PACKS 


The computation of inductive noise as discussed in this paper applies 
when the array of conductors is considered as lumped electrical ele- 
ments. This is the case for the electrically short power and ground 
leads and electrically short segments of signal leads. However, for 
electrically long signal leads, a transmission line point of view is more 
appropriate. In this case, an important consideration is the design of 
chip packages having signal leads that are impedance matched to the 
signal leads in a circuit pack. The impedancematching of chip pack- 
ages to circuit packs was treated in Ref. 7. 


VII. CONCLUSIONS 


Inductive noise limits the physical design of high-speed, high pin- 
out chip packages. The general eqs. (26) and (28) derived in this paper 
are useful for computing the inductive noise resulting from the inter- 
connections in high-speed, high pin-out chip packages. When the 
distances between conductors are small relative to conductor lengths, 
the general equations reduce to the approximate equations given as 
eqs. (29) and (30). The equations are useful for computing inductive 
noise in general arrays of wire bonds, solder balls, DIP leads, package 
pins, and connector pins. Computed results were found to agree well 
with measured results. Two simple rules are presented for minimizing 
inductive noise. The inductive noise of P/G planes can also be com- 
puted. 
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Fig. 10—Inductive noise model for an array of signal (S;), ground (G;), and idle 
conductors. 


APPENDIX 
Experimental and Computed Inductive Noise of Interconnections 


To compare computed results with experimental results, some in- 
ductive noise measurements were made on arrays of conductors in a 4 
by 10 section of a circuit-pack connector.*® 

A general electrical model for an array of signal (S;), ground (G;), 
and idle conductors is shown in Fig. 10. The signal leads (S;) are 
assumed to carry current rates J, which occur simultaneously. The 
voltages U4, Uz, and v, are used to characterize the inductive noise 
induced in the closed circuit loops. The two different ground potentials 
represent two equipotential surfaces (i.e., two copper ground planes). 

The measurements were performed for the four grounding patterns 
shown in Fig. 11. The percentage of grounds varies from 50 percent 
for ground pattern I to 10 percent for ground pattern IV. 

The experimental results for v, are presented in Table II, along with 
the corresponding computed results for the case of a signal rise time 
of 6 ns and termination resistors of 100Q. The physical dimensions 
used were obtained by measurements on the circuit-pack connector.°® 

The entries labeled single refer to the case when the average radius 
(with respect to length) is taken as 0.0234 inch and the total conductor 
length is taken as 0.790 inch. 
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GROUNDING PATTERNS FOR A 4x 10 SECTION OF A 
CIRCUIT PACK CONNECTOR 
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Fig. 11—Grounding patterns I, II, III, and IV. 


Table !I—Comparison of experimental and computed inductive 
noise with Tp = 6 ns, R= 1002 





Percent vu, 

(Experi- Percent 1: 

Pattern mental) Percent vn Percent Um (Computed) 
J 0.3 —0.889 —0.481 0.204 Single 
—0.917 —0.429 0.244 Cascade 

—0.976 —0.492 0.242 Joint 

II 0.9 —4,46 —2.34 1.06 Single 
—4,21 —-1.91 1.15 Cascade 

—4.73 —2.36 1.19 Joint 

Il 4.0 —9.88 —1.18 4,35 Single 
—9.46 —0.943 4.26 Cascade 

—10.51 -1.19 4.66 Joint 

IV 9.0 13.83 +9.24 11.54 Single 
—13.23 +6.63 9.93 Cascade 


—14.64 +9.24 11.94 Joint 
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The entries labeled “cascade” refer to the case when the y's for two 
subsections of each conductor were added. The first subsection rep- 
resents a radius of 0.015 inch and a length of 0.5 inch. The remaining 
subsection was of radius 0.038 inch and of length 0.290 inch. 

Finally, the entries labeled “joint” refer to the case when a single Y% 
was evaluated for each conductor having the radii and lengths given 
in the previous paragraph. 

~By comparing the experimental values of vy, in Table II with the 
corresponding triplet of computed values, we see that there is indeed 
good agreement in all cases. 
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